Minimalist Vervet Box Logo

Introduction

0.Opening the data

Loading data

  • First I downloaded the knitr package to create outputs as html, pdf or word files when knitting my r markdown file. I also loaded the pander package for better presentation
  • The dplyr package was installed for better manipulation of the data as filtering or creating new variables and lubridate for a better manipulation of dates and time
  • Then, I installed the readxl package to import my dataset which is called Box Experiments.xls
  • This dataset contains information related to my master thesis project. I used cyber tracker in order to record the behaviors of dyads of Vervet monkeys in a box experiment on tolerance from September 2022 to September 2023

1.Explore the data

Description of the initial datset - “Boxex”

## Glimpse of the Box Experiment dataset:
## Rows: 2,795
## Columns: 20
## $ Date                  <dttm> 2022-09-27, 2022-09-27, 2022-09-27, 2022-09-27,…
## $ Time                  <dttm> 1899-12-31 09:47:50, 1899-12-31 09:50:07, 1899-…
## $ Data                  <chr> "Box Experiment", "Box Experiment", "Box Experim…
## $ Group                 <chr> "Baie Dankie", "Baie Dankie", "Baie Dankie", "Ba…
## $ GPSS                  <chr> "-28.010549999999999", "-28.010549999999999", "-…
## $ GPSE                  <chr> "31.191050000000001", "31.191050000000001", "31.…
## $ MaleID                <chr> "Nge", "Nge", "Nge", "Nge", "Nge", "Nge", "Nge",…
## $ FemaleID              <chr> "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", …
## $ `Male placement corn` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ MaleCorn              <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ FemaleCorn            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ DyadDistance          <chr> "2m", "2m", "1m", "1m", "0m", "0m", "0m", "0m", …
## $ DyadResponse          <chr> "Tolerance", "Tolerance", "Tolerance", "Toleranc…
## $ OtherResponse         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ Audience              <chr> "Obse; Oup; Sirk", "Obse; Oup; Sirk", "Oup; Sirk…
## $ IDIndividual1         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ IntruderID            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Sey…
## $ Remarks               <chr> NA, NA, "Nge box did not open because of the bat…
## $ Observers             <chr> "Josefien; Michael; Ona; Zonke", "Josefien; Mich…
## $ DeviceId              <chr> "{7A4E6639-7387-7648-88EC-7FD27A0F258A}", "{7A4E…
  • I am now using the View function to have a sight on the entire dataset and glimpse to display a summary of my dataset

  • I have 20 variables (here columns) and 2795 trials (here rows)

  • I will now make a brief summary of each variables and their use before creating a new dataframe (df) with my variables of interest that I will call Bex

  • The highlighted variables are the ones I will use for Bex. I will then clean the data before heading to the statistical analysis and the interpretation of the results

Variables of Boxex

  • Date : “Date” is in a POSIXct format which is appropriate for the display of time

    • I want to use the date to know how many sessions have been done with each dyads in my experiment.
    • I will create a variable called Session where 1 session = 1 day
    • The data has values from the 14th of September 2022 until the 13th of September 2023
    • I may consider separating the 12 months of data in 4 seasons to make a preliminary check of a potential effect of seasonality. Nevertheless since we did not use any tools to measure the weather, temperature, humidity or food availability (also related to seasonality and weather). Categorizing my data in 4 without having further data would then be quite arbitrary. If I end up doing it in my report, it will be done without any intention to include it in my scientific analysis nor my scientific report .
  • Time : “Time is coded” in a POSIXct format

    • I do not plan to use this variable but we can see that “Time” has the correct hours displayed with a date which is incorrect.
    • (In the case I wanted to observe when the trials occurred during the day as time may have an influence on their behavior (Isbell & Young 1993)) I would need to correct the incorrect display of the date in the dataset.
    • This variable could also be useful to see when the seasonal effect took place as we only went in the morning during summer because of the heat while we went later and for longer times in the field to do the box experiment in winter
    • For now, the values in “Time” are all on the same (wrong) day which is the 31st of December
    • Note: I first did not intend to keep Time in Bex but I needed this variable to see the order of the trails within a day. I finally decided to keep it.
  • Data : chr “Data” is coded as character

    • It describes the type of data being recorded in the software cybertracker. We installed the software on tablets to record the different behaviors of vervet monkeys in our research center
    • In our case, my data was recorded in cybertracker as Box Experiment as we created a form specifically for this experiment
    • For this reason we can remove this column since the information it contains is unecessary and redundant
  • Group : chr The data is coded in r as a character

    • It describes the group of monkey in which we did the trial
    • I will keep this column to see the amount of trials that we did in the 3 group of monkeys which are Baie-Dankie (BD), Ankhase (AK), and Noha (NH)
  • GPSS : num “GPSS” is coded as numerical

    • It gives the south coordinates in which we started the experiment
    • I do not plan to use coordinates nor look at locations so I will remove this acolumn
  • GPSE : num “GPSE” is coded in as numerical

    • It gives the east coordinates in which we started the experiment
    • I do not plan to use coordinates nor look at locations so I will remove this column
  • MaleID : chr “MaleID” is coded as character

    • It indicates the name of the male involved in the trial
    • I plan to use this to see how factors related to the individual may influence the experiment (age, sex, rank)
    • It will also help me see which behaviour was displayed by each individuals (here males)
  • FemaleID : chr “FemaleID” is coded as character

    • It indicates the name of the female involved in the trial
    • I plan to use this variable in the same way as “Male ID”
    • It will also help me see which behaviour was displayed by each individuals (here females)
  • Male placement corn: dbl “Male placement corn is coded in r as double

    • It gives the amount of corn given to the male of the dyad before the trials

    • Within a session it happened that we gave more placement corn to attract the monkeys again to the boxes. This lead to an update of the number in the same session. The number found at the end of the session is the total placement corn an individual has received

    • I will fuse this column with male corn as the data has been separated between these two variables. This is due to a mistake when creating the original box experiment form in cybertracker

    • This variable could be related to the level of motivation of a monkey but as it is not directly related to my hypothesis I may not use this column. I will re-consider the use of this column later on

    • In regards of this possibility I will change the format of the variable to numerical

  • MaleCorn : dbl “MaleCorn” is coded in r as double

    • It gives the same information as in male placement corn
    • I will import the values from “male placement corn” into this one
    • I will change the format of the variable to numerical
  • FemaleCorn : dbl The data is coded in r as double

    • It gives the amount of corn given to the female of the dyad before the trials
    • It works in the same way as “male placement corn”/“MaleCORN”
    • I will change the format of the variable to numerical
  • DyadDistance : chr The data is coded in r as character

    • It gives the distance for each trial that we have done with the dyads.
    • The trial number 1 for each dyad was at 5 meters.
    • The maximum was around 10 m while the minimum is 0
    • We will have to remove the “m” for meters in order to have a numerical variable instead of character
    • Also, since the very first trials per dyad can be considered as a kind of learning phase, i may remove the 15 first trials that were made for each dyad
  • DyadResponse : chr The data is coded in r as character

    • It indicated which behaviour was produced by the dyad’s during each trial
    • The different behaviours were: Distracted, Female aggress male, Male aggress female, Intrusion, Loosing interest, Not approaching, Tolerance and Other
    • I will change the columns associated to each behavior (i.e. Response) of DyadResponse into dichotomic variables in order to see the frequency of each behaviour
    • This will allow me to see which behavior occurred more ,and behavioural differences could be found between dyads
    • As multiple response could occur within the same trial, multiple behaviors can be found in a single cell. I will create a hierarchy to reduce the amount of behaviors assigned to each trial (if there is more than one). This will also be complemented with the information found in the column remarks
      1. correct any mistakes (ex. if tolerance and aggression are together aggression>tolerance)
      2. assign as few labels per trial
      3. get a better View and understanding of the data and the most common behaviours produced by each dyad
      4. create variables that can complement the behaviour found (ex. not approaching + looks at partner would be looks at partner + a new variable called hesistant to see when the did not come but look at the other individual / )
    • Projection of the hierarchy (changes will be made)
      • Create a table with each combination existing

      • Decide what is more important

      • Ex:

        • Aggression > Tolerance
        • Tolerance > Not approaching -> Create a variable called hesistant in addtion to the tolerance count to see frequency of tolerance behaviour that happened after > 1min
        • Tolerance > Loosing interest
        • Tolerance > Intrusion
        • Not approaching = looking box but not coming while Loosing interest = not paying attention to the box
        • Intrusion > Loosing interest
        • Intrusion > Not approaching
        • Not approaching > Looks at partner
        • We can code every look at partner as no approaching and keep the count of looks at partner as additional information
        • Not approaching >?> Loosing interest ? !!
        • Define distracted
        • Not approaching > Distracted
        • Aggression > Not approaching
        • Other > Look case by case and categorize depending of behavior
        • Remarks may be used for the same reason
  • OtherResponse : chr “The data”OtherResponse” is coded as character

    • It describes any behaviour that is different from the ones found in Dyad Response (meaning ≠ tolerance, aggression, intrusion, loosing interest, not approaching, distracted, looks at partner that where categorized as other)
    • I will have to look at every OtherResponse and rename each entry in one of the response already if existing. I will proceed case by case.
    • If I want to do an intermediate manipulation I may rename every NA in “OtherResponse” into Response to see the amount of case to treat and how many occurrences seem to not fit in the categories of “DyadResponse”
  • Audience : chr “Audience” is in r as character

    • It gives the names of the individuals in the audience
    • I would like to use it to see the amount of audience (big vs small) and the dominance level of the audience (high vs low)
    • I will create a variable called NAudience to see hoy many individuals are in the audience for each trial
    • After calculating the elo ratings of the individuals using another dataset (Life history), I will create a dichotomic variable called RankAudience to see effects related to rank with the effect of audience
  • IDIndividual1 : chr “IDIndividual1” is coded in r as character

    • It gives the names of the individuals that did not approach, showed aggression, got distracted or lost interest during a trial
    • I will have to look at it to see how often these behaviors occurred
    • I will consider how to use this variable during the cleaning of the data
  • IntruderID : chr “IndtruderID” is coded as character

    • It gives the name of the individual that intruded the experiment during a trial
    • Intrusion could mean, invade the space of the experiment and interact with one of our individual, steal the food, show agnostic behavior, stand in very close proximity of the dyad’s individuals
  • Remarks : chr The data is coded in r as character

    • It gives either additional information concerning the experiment when unusual behaviors occurred , mistakes that needed to be corrected or details that we wanted to record in case we would need them
  • Observers :chr The data is coded in r as character

    • It gives the names of the observers during the experiment
    • We will not use this data as we do not look at the effect that an experimenter would have on the monkeys
    • (Should I still look at an effect of the amount of experimenter?…maybe better for detailled analysis of our study)
  • DeviceID :chr “The data”DeviceID” is coded in r as character

    • It gives the name of the device/tablet used to record the data during the experiment
    • We will not use this data either

2. Treating missing data

2.1. Creating a new dataframe - Bex

  • Since I do not want to work with the whole dataset, I’m gonna select the variables of interest using the function select

  • I will keep Time, Date, Group, MaleID, FemaleID, MaleCorn, Male placement corn, FemaleCorn, DyadDistance, DyadResponse, OtherResponse, Audience, IDIndividual1, IntruderID, Remarks

## Rows: 2,795
## Columns: 15
## $ Time                  <dttm> 1899-12-31 09:47:50, 1899-12-31 09:50:07, 1899-…
## $ Date                  <dttm> 2022-09-27, 2022-09-27, 2022-09-27, 2022-09-27,…
## $ Group                 <chr> "Baie Dankie", "Baie Dankie", "Baie Dankie", "Ba…
## $ MaleID                <chr> "Nge", "Nge", "Nge", "Nge", "Nge", "Nge", "Nge",…
## $ FemaleID              <chr> "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", …
## $ MaleCorn              <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ `Male placement corn` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ FemaleCorn            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ DyadDistance          <chr> "2m", "2m", "1m", "1m", "0m", "0m", "0m", "0m", …
## $ DyadResponse          <chr> "Tolerance", "Tolerance", "Tolerance", "Toleranc…
## $ OtherResponse         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ Audience              <chr> "Obse; Oup; Sirk", "Obse; Oup; Sirk", "Oup; Sirk…
## $ IDIndividual1         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ IntruderID            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Sey…
## $ Remarks               <chr> NA, NA, "Nge box did not open because of the bat…

2.1.1 Merging Male placement corn and MaleCorn

  • I want to process all the missing data in Bex. But before, I will merge the column MaleCorn and Male placement corn as the data of both columns is supposed to be together under “MaleCorn”
  • Looking manually in the Bex table it seems that very few data is in MaleCorn while most of it seems to be in Male placement corn
  • Every time there is a missing value in Male placement corn we can see a value in Male Corn, I will then create a new variable MaleCorn where every time that there is NA in male placement corn the value will be taken in MaleCornOld (previous malecorn). If there is no NA it will take the value of ´Male placement corn´
  • I will first rename MaleCorn to MaleCornOld, then check the amount of NA’s and then merge “MaleCornOld” and “male placement corn” into the new variable “MaleCorn”
## Number of rows with common NAs in MaleCornOld and 'Male placement corn': 1499 
## Number of occurrences of 0 in MaleCorn: 1499 
## Number of remaining NA values in MaleCorn: 0
  • I have found 1499 NA in common between MaleCornOld and ‘male placement corn’, 1609 NA in Male placement corn and 2685 in MaleCorn old

  • For the merge of MaleCornOld and Male placement corn, I used different conditions: 1.In this code, a new variable MaleCorn is created. If there is a missing value in Male placement corn, it takes the corresponding value from MaleCornOld; otherwise, it takes the value from Male placementcorn. 2.If there are no value in both MaleCornOld and Male placement corn (NA,NA) for a given row, I would like the code to display 0 as it means that no placement was given

  • In this way, I should not loose any data, minimize the mistakes and already transform the NA’s of this variable into a number which will remove the remaining NA’s which are meant to be 0

  • After the merge I found that there were no NA’s remaining in the “New” Male Corn and that 1499 0’s where found in the column which corresponds to the amount of common NA’s found previously between the “Old” Male Corn and male placement corn

2.1.2 Cleaning FemaleCorn

## Number of remaining NA values in FemaleCorn: 0

2.2 Cleaning variables with missing data

  • Now in order to see where are located the missing points in the data, I’m going to print the variables with and without NA’s

  • The function sapply is used to apply the function sum for NA’s to each column of the data frame, so each variable

## Variables with Missing Data:
x
MaleID 19
FemaleID 60
DyadDistance 33
DyadResponse 47
OtherResponse 2758
Audience 924
IDIndividual1 2143
IntruderID 2737
Remarks 2181
## Variables with No Missing Data:
x
Time 0
Date 0
Group 0
FemaleCorn 0
MaleCorn 0
  • We can see that out of the 14 variables we have in Bex we have 9 variables with missing data which are Male ID, Female ID, DyadDistance, DyadResponse, OtherResponse, Audience, IDIndividual1, IntruderID, Remarks: I will proceed to clean these variables one by one

  • MaleID 19

  • FemaleID 60

  • DyadDistance 33

  • DyadResponse 47

  • OtherResponse 2758

  • Audience 924

  • ID Individual1 2143

  • IntruderID 2737

  • Remarks 2181

  • Before making treating the NA’s in the dataset I will make a backup of the data at this point:

2.3 Treating variables with missing data

2.3.1 Cleaning “Remarks” - (2181 NA’s)

  • Since most of the time we did not have any remarks it is understandable that this variable contains 2181 NA’s out of 2795 rows

  • I will first transform every missing data in the column Remark into No Remarks and then check that the amount of “No remarks” found

  • After the changes we can effectively see that we have 2181 “No Remarks” and we have no missing data left in that column, I will treat this column by hand once all the NA’s have been removed from the dataset

## Number of 'No Remarks' in the 'Remarks' column: 2181
## 
## No Remarks    Remarks 
##       2181        614

2.3.2 Cleaning “Intruder ID” - (2737 NA’s)

  • Intruder ID is a variable that contains the name of the individuals that made and intrusion during a trial.
  • If more than one individual intruded, his name may be in the comments, which I will check when treating the data from this column
  • Because nothing was entered when there was no intrusion, I will replace every NA’s by No Intrusion
  • Also, I will use a function to create a new dichotomic variable called Intrusion. Every time there is a value in IntruderID, it should display 1 (Yes), if not a 0 (No intrusion)
## Number of 'No Intrusion' in the 'Intruder ID' column after replacement: 2737
  • We previously had 2737 NA’s in IntruderID while now we have the same amount of occurrences of IntruderID which shis that the transformation went as intended

2.3.3 Cleaning “IDIndividual1” - (2143 NA’s)

  • IDIndividual1 is meant to report the name of the individual that did a behavior such as not approach, show aggression or loose interest during a trial
  • I will now replace every NA in this column by No individual and print the amount of NA’s left and the amount of changes made
## Number of NAs replaced in IDIndividual1: 2143 
## Number of remaining NA values in IDIndividual1: 0

2.6 Cleaning “Audience” - (924 NA’S)

  • Audience is made to report every name of individuals around our dyad during a given trial
  • I will replace every NA by No audience as no entry means the absence of other individuals around
  • I will also create a new variable called “Amount audience” that will have to tell me how many individuals are found in the column Audience
## Number of changes made in 'Audience': 924
## Remaining NA values in 'Audience': 0

2.7 Cleaning “OtherResponse” - (2758 NA’S)

## Number of changes made in 'OtherResponse': 2758
## Remaining NA values in 'OtherResponse': 0

2.8 Cleaning of “Time”

  • Since the reading of the data is more complicated without the time, which was usefull to know which trial was before or after, I changed the code made for Bex and added Time in the dataframe. Since I will need it for the cleaning of Dyaddistance, I will now extract the time from the date. Even if the date is wrong as seen in the first output, the time is correct. As in the second output, only the time has been kept
## [1] "1899-12-31 09:47:50 UTC" "1899-12-31 09:50:07 UTC"
## [3] "1899-12-31 09:53:11 UTC" "1899-12-31 09:54:28 UTC"
## [5] "1899-12-31 09:55:19 UTC" "1899-12-31 09:56:56 UTC"
## [1] "09:47:50" "09:50:07" "09:53:11" "09:54:28" "09:55:19" "09:56:56"

2.9 Cleaning DyadDistance

  • Before looking at the NA’s of Dyaddistance I will remove the “m” that is in front of every number to have a numerical variable
  • Then I will look at the location of the NA’s in the data to treat them case by case.
## Warning: NAs introduits lors de la conversion automatique
## # A tibble: 69 × 16
##    Time     Date                Group    MaleID FemaleID FemaleCorn DyadDistance
##    <chr>    <dttm>              <chr>    <chr>  <chr>         <dbl>        <dbl>
##  1 12:09:34 2022-09-27 00:00:00 Baie Da… Xia    Piep              7           NA
##  2 12:13:28 2022-09-27 00:00:00 Baie Da… Xia    Piep              7           NA
##  3 16:02:32 2022-09-15 00:00:00 Ankhase  Sho    Ginq              6           NA
##  4 10:46:33 2023-08-17 00:00:00 Baie Da… Xia    Piep              0           NA
##  5 09:30:17 2023-07-29 00:00:00 Baie Da… Xin    Ouli              0           NA
##  6 12:08:51 2023-07-11 00:00:00 Baie Da… Xia    Piep              0           NA
##  7 13:30:07 2023-06-29 00:00:00 Baie Da… Sey    Sirk              0           NA
##  8 09:54:24 2023-06-27 00:00:00 Ankhase  Sho    Ginq              0           NA
##  9 10:13:56 2023-06-23 00:00:00 Ankhase  Sho    Ginq              0           NA
## 10 09:39:04 2023-06-15 00:00:00 Ankhase  Sho    Ginq              2           NA
## # ℹ 59 more rows
## # ℹ 9 more variables: DyadResponse <chr>, OtherResponse <chr>, Audience <chr>,
## #   IDIndividual1 <chr>, IntruderID <chr>, Remarks <chr>, MaleCorn <dbl>,
## #   Intrusion <dbl>, AmountAudience <dbl>
## Number of NA values in DyadDistance column (using second approach): 69 
## Rows with NA values in DyadDistance column: 24, 27, 95, 492, 744, 971, 1113, 1130, 1164, 1261, 1341, 1396, 1491, 1583, 1683, 1693, 1717, 1718, 1719, 1724, 1725, 1739, 1755, 1756, 1757, 1764, 1779, 1782, 1792, 1799, 1800, 1840, 1841, 1868, 1869, 1888, 1891, 1892, 1896, 1911, 1912, 1915, 1918, 1919, 1952, 1953, 1958, 1980, 1981, 1984, 1986, 1996, 2000, 2009, 2054, 2104, 2105, 2191, 2233, 2234, 2287, 2437, 2569, 2579, 2580, 2643, 2676, 2709, 2729
  • We have 69 missing values in DyadDistance. I will look at each row in it’s context as the actual distance of the box was always dependent of the previous trials. I will start with the bigger number as for now the oldest trial is at the last row while the closest one is in row 1.

    • If tolerance was achieved twice in a row = <1m
    • If aggression (male agress female or female agress male), not approaching,or loosing interest occured = >1m
    • If distracted or intrusion occured = same distance
    1. 24 - In trial23 (0m) there was aggression then at trial24 (1m) there was tolerance. The 24th trial is supposed to be at 1m
    2. 27 - In trial25 (1m) there was not approaching then at trial26 (2m) there was tolerance.The 25th trial is supposed to be at 2m
    3. 95 - In trial93 (2m ) there was male agress female then at trial 94 (3m) there was not approaching. The 95th trial is supposed to be at 4m
    4. 492 - In trial490 (0m) there was tolerance then at trial491 (0m) there was tolerance. The 492nd trial is supposed to be at 0m
    5. 744 - In trial742 (3m) there was aggression then at trial743 (4m) there was tolerance. The 744th trial is supposed to be at 4m
    6. 971 - In trial969 (0m) there was tolerance then at trial970 (0m) there was tolerance. The 971st trial is supposed to be at 0m
    7. 1113 - In trial1111 (2m) there was tolerance then at trial1112 (0m) there was tolerance. The 1113th trial is supposed to be at 0m
    8. 1130 - In trial1128 there was another dyad so we can not use this cell. Then at trial1129 (3m) there was not approaching. Nevertheless, we don’t have any DyadResponse, i will thus delete this row
    9. 1164 - In trial1162 (3m) there was not approaching then at trial1163 (3m) there was not approaching. The 1164th trial is supposed to be at 4m
    10. 1261 - The two preivous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    11. 1341 - In trial1339 (0m) there was tolerance then at trial1340 (0m) there was tolerance. The 1341st trial is supposed to be at 0m
    12. 1396 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    13. 1491 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    14. 1583 - In trial1581 (2m) there was not approaching and intrusion then at trial1582 (2m) there was not approaching. The 1583rd trial is supposed to be at 3m
    15. 1683 - One trial only was made with tolerance (2m) but since there are no DyadResponse I will delete this row
    16. 1693 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    17. 1717 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    18. 1718 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    19. 1719 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    20. 1724 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    21. 1725 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    22. 1739 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    23. 1755 - since there are no DyadResponse I will delete this row
    24. 1756 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    25. 1757 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    26. 1764 - Since there are no DyadResponse I will delete this row
    27. 1779 - It seems like it was the first trial of the Dyad Pom Xian, if so, the distance has to be 5m
    28. 1782 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    29. 1792 - Trial1791 was intrusion (4m) so this trial should be at 4m
    30. 1799 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    31. 1800 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    32. 1840 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    33. 1841 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    34. 1868 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    35. 1869 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    36. 1888 - he two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    37. 1891 - Since there are no DyadResponse I will delete this row
    38. 1892 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    39. 1896 - Since there are no DyadResponse I will delete this row
    40. 1911 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    41. 1912 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    42. 1915 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    43. 1918 - In trial1916 (4m) there was tolerance then at trial1917 (4m) there was not loosing interest The 1918th trial is supposed to be at 4m
    44. 1919 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    45. 1952 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    46. 1953 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    47. 1958 - In trial1956 (2m) there was tolerance then at trial1957 (2m) there was distracted. The 1958th trial is supposed to be at 2m
    48. 1980 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    49. 1981 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    50. 1984 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    51. 1986 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    52. 1996 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    53. 2000 - In trial1997 and 1999 (5m) there was tolerance then at trial1999 (5m) there was intrusion. The 2000th trial is supposed to be at 4m
    54. 2009 - Since there are no DyadResponse I will delete this row
    55. 2054 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    56. 2104 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    57. 2105 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    58. 2191 - In trial2189 (1m) there was not approaching then at trial2190 (2m) there was not approaching. The 2191st trial is supposed to be at 3m
    59. 2233 - In trial2231 (3m) there was not approaching then at trial2232 (4m) there was not approaching. The 2233rd trial is supposed to be at 5m
    60. 2234 - The trial did not happen because they where not at the right distance. I will thus delete this row
    61. 2287 - Since there are no DyadResponse I will delete this row
    62. 2437 - Since there are no DyadResponse I will delete this row
    63. 2569 - In trial2567 (1m) there was tolerance then at trial2568 (1m) there was tolerance. The 2569th trial is supposed to be at (0m)
    64. 2579 - In trial2577 (1m) there was tolerance then at trial2578 (0m) there was not approaching. The 2579th trial is supposed to be at 1m
    65. 2580 - The two previous trials were made with another Dyad. Also DyadResponse is not available. I will thus delete this row
    66. 2643 - Since there are no DyadResponse I will delete this row
    67. 2676 - In trial2674 (1m) there was tolerance then at trial2675 (0m) there was tolerance. The 2676th trial is supposed to be at 0m
    68. 2709 - In trial2707 (2m) there was tolerance then at trial2708 (2m) there was tolerance. The 2709th trial is supposed to be at 1m
    69. 2729 - In trial2727 (3m) there wastolerance then at trial2728 (2m) there was tolerance. The 2729th trial is supposed to be at 2m
  • Now that I have looked at each missing line and saw which ones to keep, I decided to create a new variable called Distance. I will also to create a new variable called No trial.

  • For the variable Distance I will replace each row where there was missing data with a value and I will delete the ones where no values could be assigned. This will allow me to have no missing data and find a number to each trial that has been done

  • Before making the changes i’m gonna make a backup called BackupbeforeDistanceNA

## Number of NA's in DyadDistance after replacements and deletions: 1 
## Data size after deletions: 2748
## Row index with NA in DyadDistance: 1925

*It seems that there is still the row 1925 with an NA in DyadDistance

  1. 1925 - In trial1923 (2m) there was distracted then at trial1924 (2m) there was tolerance. The 2725th trial is supposed to be at 2m
## Row index with NA in DyadDistance:

*In this modification, I added a check to see if the columns Dyadistance and Distance already exist in your dataframe (Bex). If they do, it prints a message saying that the modification has already been applied, and no changes are made. If they don’t exist, it proceeds with the modifications. This way, running the code multiple times won’t cause redundant changes.

  • 24 values were inserted in Distance to replace the NA’s where the distance could be found by looking at the previous rows. The 46 remaining NA’s were then removed from Distance leaving 0 remaining NA in the variable “Distance”

2.10 Cleaning Female and Male ID

  • Before cleaning Female and Male ID, here is a list of every dyad of the box experiment and their respective groups. This will help us find the missing names when only one individual is missing out of the duo (either male or female):

    1. Sirk & Sey - BD

    2. Ouli & Xin - BD

    3. Piep & Xia - BD

    4. Oerw & Nge - BD

    5. Oort & Kom - BD

    6. Ginq & Sho - AK

    7. Ndaw & Buk - Ak

    8. Xian & Pom - AK

    9. Guat & Pom - Ak

  • Note that the 4 letter codes correspond to the femaleID, the 3 letter codes to the males ID and the 2 letter codes to the group name of the monkeys

  • I need to check where are the NA’s in both FemaleID and Male ID by looking at the rows where data is missing. Since every trial was made with a Dyad and never with an single individual, treating these two columns together makes more sense. If both individuals are missing I may have to delete the row.

## Row numbers with missing values in FemaleID:  865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1884 1885 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
## Number of missing values in FemaleID:  59
## Row numbers with missing values in MaleID:  1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710
## Number of missing values in MaleID:  18
## Number of rows with missing values in both FemaleID and MaleID:  18
## Row numbers with missing values in both FemaleID and MaleID:  1693, 1694, 1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708, 1709, 1710
## Number of missing values in FemaleID not in MaleID:  41
## Row numbers with missing values in FemaleID not in MaleID:  865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 1808, 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820, 1884, 1885, 2619, 2620, 2621, 2622, 2623, 2624, 2625, 2626, 2627, 2628, 2629
  • FemaleID has 41 NA’s while they are 18 NA’s in Male ID

  • In these missing data, we have 18 NA’s that are in common between FemaleID and MaleID which represents the totality of the missing values in MaleID

  • All the missing data in MaleID are found in consecutive rows, from row 1693 to row 1710 and are from the group Noha (NH) on the 19th of april 2023. We can also see that trials had bee made in the same day, and looking at the time of the experiment, the previous trials made and the audience we can see that these NA’s in female and male ID we can asses that the individuals involved were Xian for the female ID and Pom for the MaleID. I will thus replace these values using a condtion. These NA’s in Noha (Trial 1693 to 1710) are the only NA’s that MaleID has and are the only NA’s of female ID in Noha. I will thus replace every NA of MaleID NA in Noha with Pom and every Female ID NA in Noha with Xian

## Number of remaining NA values in MaleID after replacement:  0
## Number of remaining NA values in FemaleID after replacement:  41
## Number of rows with missing values in both MaleID and FemaleID after replacement:  0
  • In order to clean FemaleID, I will use the data from the now complete MaleID. I will use conditions stating that depending which name is found in MaleID when there is an NA in FemaleID, a certain name will have to replace the NA in female ID

  • Before automating the process I will check manually the data to see if they are any exceptions or mistakes

## Rows with missing values in FemaleID:
## # A tibble: 41 × 16
##    Time     Date                Group   MaleID FemaleID FemaleCorn DyadDistance
##    <chr>    <dttm>              <chr>   <chr>  <chr>         <dbl>        <dbl>
##  1 09:31:55 2023-07-22 00:00:00 Ankhase Buk    <NA>              7            1
##  2 09:33:14 2023-07-22 00:00:00 Ankhase Buk    <NA>              7            1
##  3 09:34:07 2023-07-22 00:00:00 Ankhase Buk    <NA>              7            0
##  4 09:34:51 2023-07-22 00:00:00 Ankhase Buk    <NA>              7            0
##  5 09:36:59 2023-07-22 00:00:00 Ankhase Buk    <NA>              7            0
##  6 09:38:13 2023-07-22 00:00:00 Ankhase Buk    <NA>              7            1
##  7 09:39:26 2023-07-22 00:00:00 Ankhase Buk    <NA>              7            0
##  8 09:41:11 2023-07-22 00:00:00 Ankhase Buk    <NA>              0            0
##  9 09:42:17 2023-07-22 00:00:00 Ankhase Buk    <NA>              0            0
## 10 09:44:06 2023-07-22 00:00:00 Ankhase Buk    <NA>              0            1
## # ℹ 31 more rows
## # ℹ 9 more variables: DyadResponse <chr>, OtherResponse <chr>, Audience <chr>,
## #   IDIndividual1 <chr>, IntruderID <chr>, Remarks <chr>, MaleCorn <dbl>,
## #   Intrusion <dbl>, AmountAudience <dbl>

If there is NA in femaleID, we will replace the value with - Sirk if MaleID is Sey - Ouli if MaleID is Xin - Piep if MaleID is Xia - Oerw if MaleID is Nge - Oort if MaleID is Kom - Ginq if MaleID is Sho - Ndaw if MaleID is Buk

## # A tibble: 20 × 3
##    MaleID FemaleID Count
##    <chr>  <chr>    <int>
##  1 Xia    Piep       576
##  2 Sey    Sirk       557
##  3 Kom    Oort       338
##  4 Sho    Ginq       278
##  5 Pom    Xian       259
##  6 Buk    Ndaw       245
##  7 Xin    Ouli       159
##  8 Nge    Oerw       153
##  9 Piep   Xia         35
## 10 Oort   Kom         29
## 11 Ouli   Xin         27
## 12 Oerw   Nge         19
## 13 Sirk   Sey         17
## 14 Buk    <NA>        15
## 15 Sey    <NA>        13
## 16 Nge    <NA>        11
## 17 Buk    Ginq         6
## 18 Pom    Guat         5
## 19 Xin    Oort         4
## 20 Kom    <NA>         2
## Number of NA values in MaleID:  0
## Number of NA values in FemaleID:  0
  • After the use of the conditions in FemaleID I could see the changes where successfully done and that 0 NA’s are remaining in both FemaleID and MaleID

2.12 Dyad Response (7)

  • The last variable I still have to treat for NA’s is DyadResponse. Before treating the NA’s we had 47 NA’s we know that we treated most of them we have only 7 remaining. These NA’s can be found at the rows 871, 1163, 1219, 1339, 1579,1888 and 1962
## Rows with missing values in DyadResponse:  871, 1163, 1219, 1339, 1579, 1888, 1962
## Lines with missing values in DyadResponse:
## # A tibble: 7 × 16
##   Time     Date                Group     MaleID FemaleID FemaleCorn DyadDistance
##   <chr>    <dttm>              <chr>     <chr>  <chr>         <dbl>        <dbl>
## 1 09:39:26 2023-07-22 00:00:00 Ankhase   Buk    Ndaw              7            0
## 2 10:13:56 2023-06-23 00:00:00 Ankhase   Sho    Ginq              0            4
## 3 08:34:45 2023-06-17 00:00:00 Baie Dan… Kom    Oort              3            2
## 4 08:54:12 2023-06-09 00:00:00 Baie Dan… Xia    Piep              1            0
## 5 13:35:08 2023-05-03 00:00:00 Baie Dan… Kom    Oort              5            3
## 6 13:27:30 2023-01-18 00:00:00 Ankhase   Buk    Ndaw              5            4
## 7 08:36:49 2022-12-13 00:00:00 Baie Dan… Kom    Oort              8            4
## # ℹ 9 more variables: DyadResponse <chr>, OtherResponse <chr>, Audience <chr>,
## #   IDIndividual1 <chr>, IntruderID <chr>, Remarks <chr>, MaleCorn <dbl>,
## #   Intrusion <dbl>, AmountAudience <dbl>
  • Row 871: The previous row was tolerance at 1m and the next tolerance at 0 which means that the row 871 should be Tolerance for DyadResponse

  • Row 1163: The value can not be found from the other rows so I will delete row 1163

  • Row 1219: The previous row was not approaching at 2m and the next is tolerance at 2m and tolerance at 1m, which means that the row 1219 should be Tolerance for DyadResponse

  • Row 1339: The previous row was tolerance at 0m while the next one was tolerance at 0m, which means that the tow 1339 should be Tolerance for DyadResponse

  • Row 1579: The value can not be found from the other rows so I will delete row 1579

  • Row 1888: The value can not be found from the other rows so I will delete row 1888

  • Row 1962: The value can not be found from the other rows so I will delete row 1962

## Number of remaining NA values in DyadResponse:  0

2.13 Final check: remaing NA’s in Bex?

## Final check of NA values in Bex:
##           Time           Date          Group         MaleID       FemaleID 
##              0              0              0              0              0 
##     FemaleCorn   DyadDistance   DyadResponse  OtherResponse       Audience 
##              0              0              0              0              0 
##  IDIndividual1     IntruderID        Remarks       MaleCorn      Intrusion 
##              0              0              0              0              0 
## AmountAudience 
##              0

3. Correction and creation of New Variables

3.1 Making a backup of Bex

  • Before making new changes I will make a backup of my dataset at this point

3.2 Treating Remarks before processing with new modifications

  • Since I have removed all the missing data from the different columns, I now have to correct potential mistakes that can be found and create new variables to be able to manipulate better my data.

  • Since the column remarks contains corrections and additional information, I will treat it now

  • Before that lets check how many remarks we have in our dataset, how many of the main keywords we can find and make a visual representation of it

3.2.2 Vizualization of the Remarks Keywords

  • Before making any changes I will make a count of the total amount of remarks and a count and barplot of the main keywords in the column to see in which proportion they are found. It has to be noted that some of the words are used in different contexts and have different meaning. This is why I will clean them manually
## Number of 'No Remarks' entries:  2139
## Number of actual remarks entries:  603
  • There will be 599 entries I will have to treat manually in the Excel Spreadsheet for the Remarks

## Total number of keyword occurrences in the Barplot:  822

3.2.3 Exporting of Bex

  • I will now export the dataset and treat manually treat the remarks in an Excel spreadsheet before uploading it again and creating a new dataframe. I will also print a Glimpse of Bex to have information before the manual changes
## Glimpse of the Bex Before treating Remarks:
## Rows: 2,742
## Columns: 16
## $ Time           <chr> "09:47:50", "09:50:07", "09:53:11", "09:54:28", "09:55:…
## $ Date           <dttm> 2022-09-27, 2022-09-27, 2022-09-27, 2022-09-27, 2022-0…
## $ Group          <chr> "Baie Dankie", "Baie Dankie", "Baie Dankie", "Baie Dank…
## $ MaleID         <chr> "Nge", "Nge", "Nge", "Nge", "Nge", "Nge", "Nge", "Nge",…
## $ FemaleID       <chr> "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", "Oerw",…
## $ FemaleCorn     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7…
## $ DyadDistance   <dbl> 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 1, 1, 0, 0…
## $ DyadResponse   <chr> "Tolerance", "Tolerance", "Tolerance", "Tolerance", "To…
## $ OtherResponse  <chr> "No Response", "No Response", "No Response", "No Respon…
## $ Audience       <chr> "Obse; Oup; Sirk", "Obse; Oup; Sirk", "Oup; Sirk", "Sir…
## $ IDIndividual1  <chr> "No individual", "No individual", "No individual", "No …
## $ IntruderID     <chr> "No Intrusion", "No Intrusion", "No Intrusion", "No Int…
## $ Remarks        <chr> "No Remarks", "No Remarks", "Nge box did not open becau…
## $ MaleCorn       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ Intrusion      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ AmountAudience <dbl> 3, 3, 2, 1, 2, 2, 2, 1, 1, 2, 6, 6, 3, 2, 2, 2, 2, 2, 2…
## [1] "/Users/maki/Desktop/Master Thesis/BEX 2223 Master Thesis Maung Kyaw/IVPToleranceBex2223"

3.2.4 Journal of manual changes in Bex excel spreadsheet

  • Before treating all the data in the Remarks I will create a few columns to redistribute information

    1. Context: To add contextual information
    2. SpecialBehaviour : To report any particular behaviour an individual could have done during a trial
    3. Got corn, to see if the individual got the corn or not
  • Also whenever i will have treated a remark, i will replace it with “Treated”. And if I have to delete the row I’ll write “Delete”. After re importing the data I will make a count of these changes to see if I still have the correct amount of cells and changes that have been done

    1. Creation of the columns Context, SpecialBehaviour and GotCorn
    1. Default values for the new columns are NoContext, NoSpecialBehaviour & Yes

    a.Context: BoxMalfunction, BoxOpenedBefore, NoExperiment, Agonistic, Guat;Ap;Xian, CornLeak, BetweenGroupEncounter, ContactCalling,

    b.SpecialBehaviour Oerw;Vo;Exp, Sey;Ap;AfterOpen, Oerw;Vo;Exp,Nge;Vo;Exp, Sirk;ApAfter30, Sirk;Av;Oerw, Oerw;Lo,Sey;Sf;Oort,Oort;At;Kom, Kom;Ap;AfterOpen, Sey;Ch,Sirk, Xin;Hesitation. Xia;Sf;Piep, Pom;Sf;Xian,Kom;Sf;Oort, Sey;Sf;Sirk, Xia;Sf;Piep,Piep;Sf,XIa, Oort;At;Kom, Sey;Rt;Sho;Ap, Sho;Rt;Ginq;Ap, Buk;Sf;Ndaw, Sho;Rt;Ndaw;Ap, Oort;Sf;Kom, Ginq;Sho;Ap;After30, Ndaw;Sc,Buk;Sf, Ndaw;Ap;After30, Kom;Ap;After30, Xia;Piep;Ap;After30, Pom;Bi;Xian, Sho;Ndaw;Av;Buk, Kom;Sf;Oort, Kom;St;Oort,Oort;St;Kom, Sey;Hi;Sirk, Obse;Ap;Piep;Av,Piep;Sf;Xia, Sirk;ApWhenPartnerLeft, Sey;Hh;Sirk, Xia;Sf;Piep;Sc, Xia;Piep;ShareFood, Piep;Ap;After30,Xia;Mu;Piep, Oort;St;Sirk;Ja,Sey;Sf, Pom;Sf;Xian, Ndaw;ApWhenPartnerLeft, Xian;At;Pom,Gaya;Su, Xian;Sf;Pom, Xian;Hesitation, Xia;ApWhenPartnerLeft,Sirk;Hesitation, Ginq;Hesitation, Sey;Ap;Kom;Av, Oort;Sc;Kom, Xian; Pom, Pom;Ap;Xian, Pom;Ap;Xian,Xian;Rt, Sey;Ap;Sirk;Rt, Sey;St;Sirk;Ig, Xia;Asf;Piep, Piep;ApWhenPartnerLeft, Sho;Ap;After30, Ginq;ApWhenPartnerLeft, Pom;Sf;Xian;Sf;Pom, Xian;ApWhenPartnerLeft, Piep;Ch;Sirk, Sey;St;Sirk, Ndaw;Ap;After30, Xian;Ap;After30, Xian;St;Prai, Pom;Sf;Xian;Vc, Kom;Ap;After30, Kom;ApproachWithPartner, Oort;ApWhenPartnerLeft, Sho;Ap;After30,Ginq;Ap;After30, Ginq;ApproachWithPartner, Ndaw;Hesitation, Oerw;Hesitation, Oerw;ApWhenPartnerLeft. Piep;Ap;After30, Sirk;Ap;After30, Xia;Ap;After30, Ouli;Gr;BBOuli, Oerw;Ap;After30, Sirk;Hesitation, Sey;Ap;Sirk;Av, Ouli;Ap;Xia;Av, Xin;Ap;After30, Sho;Sf;Ginq;Sc, Xia;ApWhenPartnerLeft, Sey;Ap;Sirk;Ja, Nge;Oerw;ShareFood, Nge;Ap;Oerw;Oerw;At,Obse;At;Nge,

    c.GotCorn: No;Nge, No;Piep, No;Xian, No;Oort, No;Sirk, No;Kom, No;Ndaw, No;Kom, No;Oort, No;Xia, No;Buk, No;Sho, No;Sey,No;Piep, No;Ginq

    1. Remarks: Treated, TODelete
    1. Values set for exsting columns
    1. IntruderID: Sey, Oerw, Guat, Kom, Gris, Sho, Oerw; Ouli, Guat; Gri, Xop, Obse, Oort, Obse; Sey, Ginq; Ghid, Xia, Grif, Sey, Gree; Gran, Godu; Gub, Gran, Oerw; Nak, Ghid, Buk, Oup

    2. DyadDistance: 6, 7, 8, 9 , 1

    3. Audience: UnidentifiedAudience, Ouli; Riss, Gris, Sey, Sey; Piep; Sirk, Oup Ome

    4. IDIndividual1: Piep, Oort; Kom, Ndaw; Buk, Sho; Ginq, Ndaw, Buk, Xian, Pom, Oort; Kom, Buk; Ndaw, Sirk; Sey, Xin; Ouli, Oerw; Nge

    5. DyadResponse: Tolerance, Not approaching; Losing interest, Losing interest; Intrusion

3.3 Re Uploading the dataset after the treatment of the Remarks

  • Now that I have manually treated the remarks in a spreadsheet I will re import the dataset
## Rows: 2,742
## Columns: 19
## $ Time             <chr> "09:47:50", "09:50:07", "09:53:11", "09:54:28", "09:5…
## $ Date             <dttm> 2022-09-27, 2022-09-27, 2022-09-27, 2022-09-27, 2022…
## $ Group            <chr> "Baie Dankie", "Baie Dankie", "Baie Dankie", "Baie Da…
## $ MaleID           <chr> "Nge", "Nge", "Nge", "Nge", "Nge", "Nge", "Nge", "Nge…
## $ FemaleID         <chr> "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", "Oerw", "Oerw…
## $ FemaleCorn       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 7, 7,…
## $ DyadDistance     <dbl> 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 1, 1, 0,…
## $ DyadResponse     <chr> "Tolerance", "Tolerance", "Tolerance", "Tolerance", "…
## $ OtherResponse    <chr> "No Response", "No Response", "No Response", "No Resp…
## $ Audience         <chr> "Obse; Oup; Sirk", "Obse; Oup; Sirk", "Oup; Sirk", "S…
## $ IDIndividual1    <chr> "No individual", "No individual", "No individual", "N…
## $ IntruderID       <chr> "No Intrusion", "No Intrusion", "No Intrusion", "No I…
## $ Remarks          <chr> "No Remarks", "No Remarks", "Treated", "Treated", "No…
## $ MaleCorn         <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ Intrusion        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
## $ AmountAudience   <dbl> 3, 3, 2, 1, 2, 2, 2, 1, 1, 2, 6, 6, 3, 2, 2, 2, 2, 2,…
## $ Context          <chr> "NoContext", "NoContext", "BoxMalfunction", "BoxOpene…
## $ SpecialBehaviour <chr> "NoSpecialBehaviour", "NoSpecialBehaviour", "Oerw;Vo;…
## $ GotCorn          <chr> "Yes", "Yes", "No;Nge", "Yes", "Yes", "Yes", "Yes", "…
  • And Check for any remaining NA’s in BexClean
## Number of NA entries in Context:  0
## Number of NA entries in SpecialBehaviour:  0
## Number of NA entries in GotCorn:  0
## Number of NA entries in BexClean:  0

3.4 Time - Creation of Period and Hour

  • Time : I considered looking at the time sections in which we did the expermiment. I will thus look at the time ranges (max and min in the day / latest and earliest time) before separating the day in different sections to have an idea in which part of the day most of the experiments occured. This will not be used in my analysis, but if I wanted to, I could interesting to compare the amount of experimentations made per day and have a line indicating the time of sunrise.

  • The Minimum Time in the dataset is 06:03:26* while the Maximum Time is at 16:36:59

  • In my box experiment I have this variable called time that tells me when the experiment was done. I don’t think I need this information per se. I was wondering if it could be easy and interesting to see from when to when the time occurs and then separate this time in a few sections like early, monring, morning, miday, afternoon, end of the day

  • a.6 to 8 : Early morning b.8 to 10: Morning c.10 to 12: Noon d.12 to 14: Afternoon e.14 to 17: End of the day

  • Last, I want to create a variable called Hour that will take the value in Time and round it to the hour in which it is ex: from 06:00 to 06:59 -> 6, from 07:00 to 07:59 -> 7 etc…

  • This will allow me to see when most of the trials occured with more detail and I will be to see in which hour most of the trial happened. Nevertheless Period will be better for an improved readability

3.5 Date - Creation of Month and Day

  • i want to Create a variable called month to see the month of the experiment and day so I know which day of the experiment it was (1st, 10th, 1000th..)

3.6 Male and Female ID - Creation of Dyad, Trial and Session

  • I will use Female and Male ID to create different variables
    1. While checking if there are still any mistakes in FemaleID and MaleID using unique, I saw that some of the names are in the wrong rows. I want the 3 letter male codes whether they are in the column FemaleID or MaleID to be in the new column Male while I want the 4 letter female codes whether they are in FemaleID or MaleID to be in the new column Female before checking again with unique that the transformation worked. I will use mutate
## Unique Female IDs: Sirk Ginq Piep Oerw Xin Ndaw Xia Sey Ouli Nge Oort Xian Guat Kom
## Unique Male IDs: Sey Sho Xia Nge Ouli Buk Piep Sirk Xin Oerw Kom Pom Oort
Buk Kom Nge Oerw Oort Ouli Piep Pom Sey Sho Sirk Xia Xin
Ginq 6 0 0 0 0 0 0 0 0 277 0 0 0
Guat 0 0 0 0 0 0 0 5 0 0 0 0 0
Kom 0 0 0 0 29 0 0 0 0 0 0 0 0
Ndaw 259 0 0 0 0 0 0 0 0 0 0 0 0
Nge 0 0 0 19 0 0 0 0 0 0 0 0 0
Oerw 0 0 164 0 0 0 0 0 0 0 0 0 0
Oort 0 337 0 0 0 0 0 0 0 0 0 0 4
Ouli 0 0 0 0 0 0 0 0 0 0 0 0 159
Piep 0 0 0 0 0 0 0 0 0 0 0 575 0
Sey 0 0 0 0 0 0 0 0 0 0 17 0 0
Sirk 0 0 0 0 0 0 0 0 570 0 0 0 0
Xia 0 0 0 0 0 0 35 0 0 0 0 0 0
Xian 0 0 0 0 0 0 0 259 0 0 0 0 0
Xin 0 0 0 0 0 27 0 0 0 0 0 0 0
  1. Create a variable called Male that in each row will take the name of the 3 letter code that is either in MaleID or Female ID and a variable called Female that in each row will take the name of the 4 letter code that is either in MaleID or FemaleID
## Unique Dyads: Sey Sirk Sho Ginq Xia Piep Nge Oerw Xin Ouli Buk Ndaw Buk Ginq Kom Oort Pom Xian Pom Guat Xin Oort
Var1 Freq
Buk Ginq 6
Buk Ndaw 259
Kom Oort 366
Nge Oerw 183
Pom Guat 5
Pom Xian 259
Sey Sirk 587
Sho Ginq 277
Xia Piep 610
Xin Oort 4
Xin Ouli 186
## Unique Male-Female Combinations:
## # A tibble: 11 × 2
##    Male  Female
##    <chr> <chr> 
##  1 Sey   Sirk  
##  2 Sho   Ginq  
##  3 Xia   Piep  
##  4 Nge   Oerw  
##  5 Xin   Ouli  
##  6 Buk   Ndaw  
##  7 Buk   Ginq  
##  8 Kom   Oort  
##  9 Pom   Xian  
## 10 Pom   Guat  
## 11 Xin   Oort
  1. Create the variable called Dyad created by combining the name of FemaleID and MaleID into one name with a space between the two codes. For information the 3 letter code is the name of the female while the 4 letter code is the name of the male like displayed here with the correct dyads;
  • The correct dyads are:
    • Buk Ndaw
    • Kom Oort
    • Nge Oerw
    • Pom Guat
    • Pom Xian
    • Sey Sirk
    • Sho Ginq
    • Xia Piep
    • Xin Ouli
  • While the dyads we have now in the datset are;
    • Buk Ginq 6
    • Buk Ndaw 259
    • Kom Oort 366
    • Nge Oerw 183
    • Pom Guat 5
    • Pom Xian 259
    • Sey Sirk 587
    • Sho Ginq 277
    • Xia Piep 610
    • Xin Oort 4
    • Xin Ouli 186
## [1] "Wrong Rows:"
##  [1]  613  614  615  616  617  931 2710 2711 2712 2713
## [1] "Wrong Dyads:"
##  [1] "Buk Ginq" "Buk Ginq" "Buk Ginq" "Buk Ginq" "Buk Ginq" "Buk Ginq"
##  [7] "Xin Oort" "Xin Oort" "Xin Oort" "Xin Oort"
  • They are a 10 wrong dyads that I will have to identify in the dataset and manually correct, those wrong dyads to change and identify are: -Buk Ginq - 6 occurences -Xin Oort - 4 occurences

  • I will change the occurences of Buk Ginq to Sho Ginq for row 613 to 617 and row 931. I know these trials are with Sho Ginq because the comments mentioned Sho in them while Male(ID) gave Buk which was a mistake

  • For the rows from 2710 to 2713 since, Ouli is in the audience it is unlikely that we had trials with the dyad Xin Ouli. Also I think they are little chances that the names of both individuals were entered wrong. I will replace these occurences where we had Xin Oort by Kom Oort

  • I thus want Buk to be replaced in male ID in rows 613 to 617 and row 913 with Sho and, Xin to be replaced by Kom in rows 2710 to 2713 in Male ID before updating Dyad

  • If Rows 613 to 617 and 931 are coded with Buk for MaleId and Ginq for FemaleId replace Male by Sho

## Rows to correct Sho Ginq:
## [1] 613 614 615 616 617 931
## Rows to correct Kom Oort:
## [1] 2710 2711 2712 2713
## Wrong Rows After Correction:
## integer(0)
## Wrong Dyads After Correction:
## character(0)
## All dyads are now correct.
## Unique Dyads after correction: Sey Sirk Sho Ginq Xia Piep Nge Oerw Xin Ouli Buk Ndaw Kom Oort Pom Xian Pom Guat
Var1 Freq
Buk Ndaw 259
Kom Oort 370
Nge Oerw 183
Pom Guat 5
Pom Xian 259
Sey Sirk 587
Sho Ginq 283
Xia Piep 610
Xin Ouli 186
## Number of rows changed to Sho Ginq: 6
## Number of rows changed to Kom Oort: 4
  1. Create the variable called Trial where the data will be sorted by date and dyad in order to see how many trials have been done with each individual: One row (per dyad) = one trial and the variable called Day where the data will be sorted by date and dyad and day in order to see how many sessions have been done with each individual: One day (per dyad) = one session Now, let’s proceed with creating the Dyad variable, Trial, and Day:
  • **Experiment Day (across all dyads): This variable should count the number of unique experiment days across all dyads. For instance, if multiple dyads have trials on the same date, that date should be considered a single experiment day.

  • Dyad Day (within each dyad): This variable should count the number of unique experiment days for each dyad separately. The first day of trials for a dyad should be Day 1, the second distinct date of trials should be Day 2, and so on.

  • DaysSinceStart: Tracks the total number of days since the first experiment, counting every calendar day, including gaps between experiments.

  • ExperimentDay: Counts unique experiment dates, with each distinct date assigned a consecutive day number.

  • Trial: Numbers the trials sequentially within each dyad, starting from 1 for each dyad.

    • DyadDay: Counts unique experiment days for each dyad separately, assigning Day 1 to the first distinct date, Day 2 to the next, etc.
## # A tibble: 2,742 × 5
##    Date       Dyad     Trial DyadDay TrialDay
##    <date>     <chr>    <int>   <int>    <int>
##  1 2022-09-29 Buk Ndaw     1       1        1
##  2 2022-09-29 Buk Ndaw     2       1        2
##  3 2022-09-29 Buk Ndaw     3       1        3
##  4 2022-10-04 Buk Ndaw     4       2        1
##  5 2022-10-13 Buk Ndaw     5       3        1
##  6 2022-10-13 Buk Ndaw     6       3        2
##  7 2022-10-13 Buk Ndaw     7       3        3
##  8 2022-10-13 Buk Ndaw     8       3        4
##  9 2022-10-13 Buk Ndaw     9       3        5
## 10 2022-10-13 Buk Ndaw    10       3        6
## # ℹ 2,732 more rows
  1. Make a summary of trial and session so I can see see how many trials and sessions have been done with the individuals
## Total Number of Unique Experiment Days: 92
## 
## Combined Summary:
Summary of Trials and Days
Dyad Amount of Trials Number of Days
Xia Piep 610 49
Sey Sirk 587 53
Kom Oort 370 31
Sho Ginq 283 35
Buk Ndaw 259 39
Pom Xian 259 19
Xin Ouli 186 27
Nge Oerw 183 22
Pom Guat 5 1

  1. After relfection I decided to remove every column that is with PomGuat since they are not enough trials for this Dyad and since we then changed PomGuat for PomXian. I have 5 occurnces to change. For easier manipulation I will remove every row where there is Guat
## Change in Rows:  -5

3.7 Female and Male Placement Corn

  • The idea is that for each Dyad, we gave an amount of corn to attract the monkey of a dyad to the right distance of his partner for a trial by putting corn in experiment box that he could get by approaching. We repeated this step as much as needed to have our Dyad at the desired distance to continue the trials from the previous day of experimentation. This means I will only Keep the last number per dyad and day for each day.
  • I decided to create two variables, that are called PlacementMale and PlacementFemale that will only keep the final amount of corn given to each individual within a day of experiment
## Placement columns were created successfully.

3.8 DyadDistance - Creation of proximity

  • I would like to create a variable called proximity to have another measure of the proximity of the individuals.
  • First lets look at the maximum and minimum distance found in Dyad Distance
## Maximum Distance: 10
## Minimum Distance: 0
  • The minumum Distance is 0m while the maximum Distance is 10m
  • Then lets make a graph of the frequency for each distances with a line for the median to show the most frquent distance

  • Now lets create a new variable called Proximity using the distances found in DyadDistance on the following model:
    1. 0 = Contact
    2. 1 - 2 = 1-2
    3. 2 - 3 = 2-3
    4. 4 - 5 = 4-5
    5. 5 - 10 = +5

3.9 Dyad Response - Corrections and detailed cleaning

3.9.1

  • Reminder: The different behaviors that are coded in DyadResponse are: Distracted, Female aggress male, Male aggress female, Intrusion, Loosing interest, Not approaching, Tolerance and Other

    • I will change the columns associated to each behavior (i.e. Response) of DyadResponse into dichotomic variables in order to see the frequency of each behaviour
    • This will allow me to see which behavior occurred more than others, and what differences are between dyads
    • As multiple behaviours could occur within the same trial, multiple responses (data entries) can be found in a single cell. I will create a hierarchy to reduce the amount of behaviors assigned to each trial (if there is more than one).
      1. correct any potential discrepancies (ex. if tolerance and aggression occured within the same trial aggression>tolerance)
      2. assign as few labels per trial, ideally one using a hierarchy among the occuring behaviours to choose which response to keep
      3. get a better view and understanding of the data and the most common behaviours produced by each dyad producing plots and tables
      4. if necessary create new variables that can complement redistribute the information initally found in the column
  • I will create some tables to have a better understanding of the state of the column dyadresponse and the different existing combinations at this point

  • Also I will create the hierarchy before implementing in the dataset

3.9.2 Dyad Response description

  • I need to know how many different combinations exist within DyadResponse and how many occurrences/cells have more than one response per cell. I need the code to not take in account the order but rather the responses in themselves
  • I want one table for the Different combinations and one showing the differernt combinations for only ONE response per cell, in a second table
## Total number of rows in DyadResponse:  2737
## Number of rows with a single entry in DyadResponse:  2466
## Number of rows with multiple entries in DyadResponse:  271
## Number of rows with exactly 2 entries in DyadResponse:  258
## Number of rows with more than 2 entries in DyadResponse:  13
## 
## 
## Table: Rows with Single Responses in DyadResponse
## 
## |DyadResponse        | Frequency|
## |:-------------------|---------:|
## |Tolerance           |      1809|
## |Not approaching     |       465|
## |Male aggress female |        73|
## |Intrusion           |        51|
## |Losing interest     |        37|
## |Female aggress male |        18|
## |Other               |         9|
## |Distracted          |         4|

## 
## 
## Table: Rows with Multiple Responses in DyadResponse
## 
## |Combination                                       | Frequency|
## |:-------------------------------------------------|---------:|
## |Intrusion;Tolerance                               |        49|
## |Losing interest;Not approaching                   |        47|
## |Intrusion;Not approaching                         |        30|
## |Male aggress female;Tolerance                     |        27|
## |Looks at partner;Tolerance                        |        23|
## |Losing interest;Tolerance                         |        22|
## |Female aggress male;Tolerance                     |        19|
## |Looks at partner;Not approaching                  |         9|
## |Other;Tolerance                                   |         7|
## |Distracted;Not approaching                        |         6|
## |Not approaching;Tolerance                         |         4|
## |Distracted;Losing interest                        |         3|
## |Female aggress male;Male aggress female;Tolerance |         3|
## |Female aggress male;Not approaching               |         3|
## |Distracted;Tolerance                              |         2|
## |Female aggress male;Intrusion;Tolerance           |         2|
## |Male aggress female;Not approaching               |         2|
## |Distracted;Intrusion;Not approaching              |         1|
## |Distracted;Intrusion;Tolerance                    |         1|
## |Female aggress male;Intrusion                     |         1|
## |Intrusion;Losing interest                         |         1|
## |Intrusion;Losing interest;Not approaching         |         1|
## |Intrusion;Male aggress female                     |         1|
## |Intrusion;Not approaching;Tolerance               |         1|
## |Looks at partner;Losing interest;Not approaching  |         1|
## |Looks at partner;Male aggress female              |         1|
## |Looks at partner;Male aggress female;Tolerance    |         1|
## |Looks at partner;Other;Tolerance                  |         1|
## |Losing interest;Not approaching;Tolerance         |         1|
## |Not approaching;Other                             |         1|

## Unique Combinations and Counts for More than 2 Entries:
## Female aggress male & Male aggress female & Tolerance 3
## Female aggress male & Intrusion & Tolerance 2
## Distracted & Intrusion & Not approaching 1
## Distracted & Intrusion & Tolerance 1
## Intrusion & Losing interest & Not approaching 1
## Intrusion & Not approaching & Tolerance 1
## Looks at partner & Losing interest & Not approaching 1
## Looks at partner & Male aggress female & Tolerance 1
## Looks at partner & Other & Tolerance 1
## Losing interest & Not approaching & Tolerance 1
## Rows with multiple entries in DyadResponse:  3 4 13 53 58 64 68 71 72 84 88 89 97 98 113 133 137 192 194 244 275 277 278 284 289 298 302 304 318 322 326 327 328 331 341 346 347 367 368 387 388 391 397 409 454 468 469 482 497 510 511 522 529 536 597 600 626 629 687 696 706 711 713 726 740 748 749 757 760 761 763 768 769 780 786 818 829 841 843 845 861 866 877 888 892 898 912 919 934 937 938 954 955 962 973 997 1004 1015 1027 1042 1048 1060 1104 1141 1143 1164 1165 1190 1209 1213 1217 1225 1229 1236 1240 1243 1245 1247 1254 1267 1271 1284 1288 1293 1308 1310 1311 1326 1327 1330 1336 1342 1344 1353 1378 1398 1399 1407 1416 1461 1464 1473 1478 1484 1488 1489 1495 1510 1520 1521 1528 1558 1595 1598 1602 1607 1616 1629 1630 1633 1636 1639 1642 1645 1654 1658 1679 1708 1714 1718 1724 1741 1750 1752 1784 1785 1788 1795 1800 1801 1802 1804 1805 1806 1807 1808 1815 1816 1819 1820 1821 1822 1824 1825 1828 1829 1830 1831 1832 1837 1843 1856 1857 1858 1889 1897 1968 1984 2023 2066 2088 2090 2091 2100 2101 2102 2103 2104 2105 2109 2112 2113 2147 2148 2149 2152 2153 2166 2175 2184 2185 2192 2193 2194 2199 2201 2202 2205 2218 2226 2237 2248 2257 2290 2294 2343 2353 2354 2359 2360 2362 2363 2370 2388 2395 2396 2400 2465 2466 2467 2493 2534 2601 2609 2626 2629 2630 2643 2651 2657 2730
## Unique Responses and Counts (Sorted Alphabetically):
## Distracted                4
## Distracted Losing interest 3
## Female aggress male       18
## Female aggress male Intrusion 1
## Female aggress male Not approaching 3
## Intrusion                 51
## Losing interest           37
## Losing interest Intrusion 1
## Male aggress female       73
## Male aggress female Intrusion 1
## Male aggress female Looks at partner 1
## Male aggress female Not approaching 2
## Not approaching           465
## Not approaching Distracted 6
## Not approaching Distracted Intrusion 1
## Not approaching Intrusion 30
## Not approaching Looks at partner 9
## Not approaching Losing interest 47
## Not approaching Losing interest Intrusion 1
## Not approaching Losing interest Looks at partner 1
## Not approaching Other     1
## Other                     9
## Tolerance                 1809
## Tolerance Distracted      2
## Tolerance Distracted Intrusion 1
## Tolerance Female aggress male 19
## Tolerance Female aggress male Intrusion 2
## Tolerance Intrusion       49
## Tolerance Looks at partner 23
## Tolerance Looks at partner Other 1
## Tolerance Losing interest 22
## Tolerance Male aggress female 27
## Tolerance Male aggress female Female aggress male 3
## Tolerance Male aggress female Looks at partner 1
## Tolerance Not approaching 4
## Tolerance Not approaching Intrusion 1
## Tolerance Not approaching Losing interest 1
## Tolerance Other           7

3.9.3 Dyad Response Hierarchy

3.9.3.a DyadResponse Hierarchy - Looks at partner
  • I will delete all the occurences of Looks at partner in **DyadResponse as this data was not collected during the whole period of the experiment and will not be usable for my analysis
## Number of occurrences of 'Looks at partner' in DyadResponse:  36
## Number of rows that have been changed:  36
3.9.3.b DyadResponse Hierarchy - Other
  • I will print the amount of Other occurrencess in DyadResponse and treat them to remove some multiple combinations
## Number of 'Other' occurrences in DyadResponse:  18
Rows with ‘Other’ in DyadResponse
Line MaleID FemaleID OtherResponse
39 Buk Ndaw No Response
286 Kom Oort Ooet scream while at the box and Kom get the corm of both
300 Kom Oort Both at boxes; Kom touching the box of Oort
304 Kom Oort Kom touching Oort’s box. Oort came after 30 sec to her own box.
700 Nge Oerw No Response
815 Pom Xian No Response
826 Pom Xian No Response
1484 Sey Sirk wait for her. moved back and waited for sirk to approach and came back
1659 Sho Ginq No Response
1741 Sho Ginq Sho took one of her corn
2101 Xia Piep aggression
2102 Xia Piep Xia ate corn from both boxes
2103 Xia Piep Xia stolen some pieces from piep box
2104 Xia Piep Xia immediately go for piep box and eat her corn then eat his.
2105 Xia Piep piep got 2 pieces out of 3. Xia took one corn
2170 Piep Xia fem ag male
2184 Xia Piep Neither ID approach
2729 Xin Ouli opened for oerw
  • There are 18 cases where was Other as a Response in the experiment. Here are the modifications I will make for them:

    -First lets remember that all of these DyadResponse contain Other which will make their identification easier

    • Row 39: Since there were No Response as “Other” I will Delete Row 39 (Buk Ndaw No Response)
    • Row 286: In DyadResponse instead of Other I will put Tolerance and in SpecialBehaviour I will put Kom;Ap;Sf;Oort,Oort;Rt;Sc and in GotCorn I will put No;Oort (Kom Oort Ooet scream while at the box and Kom get the corm of both)
    • Row 300: In DyadResponse I will put Tolerance as both individual were at their box at the same time (300 Kom Oort Both at boxes; Kom touching the box of Oort)
    • Row 304: In DyadResponse I will put Oort;Ap;After30 in SpecialBehaviour (Kom Oort Kom touching Oort’s box. Oort came after 30 sec to her own box.)
    • Row 700: Since ther were No Response as “Other” I will Delete Row 700 ( Nge Oerw No Response)
    • Row 815: Since ther were No Response as “Other” I will Delete Row 815 (Pom Xian No Response)
    • Row 826: Since ther were No Response as “Other” I will Delete Row 826 (Pom Xian No Response)
    • Row 1484: In Row 1484 I will replace the DyadResponse occurence Other by Tolerance** and put Sirk;ApproachWithPartner (Sey Sirk wait for her. moved back and waited for sirk to approach and came back)
    • Row 1659: Since there were No Response as “Other” I will Delete Row 1659 (1659 Sho Ginq No Response)
    • Row 1741: In Row 1174 Sho stole corn to Ginq. I will put Sho;Sf;Ginq in SpecialBehaviour and No;Ginq in GotCorn (Sho Ginq Sho took one of her corn)
    • Row 2101: The information in Row 2101 stating aggression from Pix is not clear to whom. I will delete Row 2101 (Xia Piep aggression)
    • Row 2102: In Row 2102 Xia stole corn to Piep. I will put Tolerance in DyadResponse, Xia;Sf;Piep in SpecialBehvaiour and No;Piep in GotCorn (Xia Piep Xia ate corn from both boxes)
    • Row 2103: In Row 2103 Xia stole corn to Piep. I will put Tolerance in DyadResponse, Xia;Sf;Piep in SpecialBehvaiour and No;Piep in GotCorn (Xia Piep Xia stolen some pieces from piep box)
    • Row 2104: In Row 2104 Xia stole corn to Piep. I will put Tolerance in DyadResponse, Xia;Sf;Piep in SpecialBehvaiour and No;Piep in GotCorn (Xia Piep Xia immediately go for piep box and eat her corn then eat his.)
    • Row 2105: In Row 2105 Xia stole corn to Piep. I will put Tolerance in DyadResponse, Xia;Sf;Piep iN SpecialBehvaiour and No;Piep in GotCorn (Xia Piep piep got 2 pieces out of 3. Xia took one corn)
    • Row 2170: In Row 2170 I will replace DyadResponse by Female aggress male (Piep Xia fem ag male)
    • Row 2184: As both individual did not approach and Not approaching is already in DyadResponse, I will just remove Other from DyadResponseXia Piep Neither ID approach (Xia Piep Neither ID approach)
    • Row 2729: As there was a mistake in Row 2729 I will delete this line: Delete Row 2729 (Xin Ouli opened for oerw)
    • Finally I will delete all the occurences of Other in DyadResponse
  • First I will remove the lines with the combinations Other & No Response

## Number of combinations with 'Other' in DyadResponse and 'No Response' in OtherResponse:  5
## Row numbers:  39, 700, 815, 826, 1659
## Number of rows that have been changed:  5
## Number of occurrences of 'Other' in DyadResponse left:  13
  • Then I will treat the remaining 13 lines according to the rules I explained
## Number of rows changed:  0
  • We now have 0 remaing rows with Others in DyadResponse and have removed the cases where we had different outputs than Other Resposne in the Other column
3.9.3.c Decision making for Dyad Response Hierarchy
  • I will create a Hierarchy for DyadResponse in order to treat cases where multiple behaviours were produced within a trial in order to reduce the amount of responses and clear any discrepancies that could be found

  • Aggression & Tolerance: I will keep Agression as we defined tolerance as the absence of any sign of aggression between individuals of a Dyad. Each Time we wrote tolerance it means that the monkeys touched the boxes at the same time. I will create a variable called SimulatenousTouch to record every time the box did the action at the same time. Then I will in every case of aggression and tolerance within a trial keep Aggression: Aggression > Tolerance. (Note that aggression is not yet a variable in itself but instead we have Male aggress female & Female aggress male as occurences of aggression)

## Number of cases with Tolerance & Male aggress female:  31
## Number of cases with Tolerance & Female aggress male:  24
## Number of rows changed:  55
## Number of lines remaining with both Aggression and Tolerance:  0
  • Tolerance & Loosing interest: I will keep Tolerance as it means that the dyad did touch the boxes: Tolerance > Loosing interest
## Number of cases with Tolerance & Losing interest:  23
## Number of rows changed:  23
## Number of lines remaining with both Tolerance and Losing interest:  0
  • Tolerance & Intrusion: I will keep Tolerance. Before replacing these occurrences I will created a variable called Intrusion the value will be Yes if there was Intrusion in DyadResponse (alone or with another response) and No if there wasn’t any Intrusion: Tolerance > Intrusion
## Number of cases with Tolerance & Intrusion:  53
## Number of rows changed:  53
## Number of lines remaining with both Tolerance and Intrusion:  0
  • Not approaching & Loosing interest: I will replace the occurrences of Loosing interest with Not approaching as individuals who did not pay/ lost interest did not come to the box
## Initial duplicates of 'Not Approaching':  571 
## Final duplicates of 'Not Approaching':  634 
## Final number of 'Not approaching' occurrences after replacing 'Losing interest':  634 
## Number of rows with 'Losing interest' remaining: 0
  • Replacement of Loosing interest with not approaching:
    1. When there is Loosing interest and Not approaching: keep not approaching
    2. Replace remaining occurences of Loosing interest (along or combined with another response, with not approaching)
## Initial duplicates of 'Not Approaching':  634 
## Final duplicates of 'Not Approaching':  634 
## Final number of 'Not approaching' occurrences after replacing 'Losing interest':  634 
## Number of rows with 'Losing interest' remaining: 0
  • Replacement of intrusion by not approaching
  • Intrusion & Not approaching: I will keep Intrusion: Intrusion > Not approaching
## Initial duplicates of 'Not Approaching; Not Approaching':  0 
## Final duplicates of 'Not Approaching; Not Approaching':  0 
## Number of cases with Intrusion & Not Approaching:  34 
## Number of lines remaining with both Intrusion and Not Approaching:  0
  • Not approaching & Distracted: I will keep Not approaching: Not approaching > Distracted
## Initial duplicates of 'Not Approaching; Not Approaching':  0 
## Final duplicates of 'Not Approaching; Not Approaching':  0 
## Number of changes made to replace 'Distracted' with 'Not approaching':  17 
## Number of changes in 'Not approaching' entries:  8
  • Male Aggress Female / Female Aggress Female and Not Approaching: I will keep both Aggression > Not approaching
## Number of cases with Male Aggress Female & Not Approaching:  2
## Number of cases with Female Aggress Male & Not Approaching:  3
## Total number of cases with Aggression & Not Approaching:  5
## Number of changes made to remove 'Not Approaching' with 'Male aggress female':  2
## Number of changes made to remove 'Not Approaching' with 'Female aggress male':  3
## Number of remaining cases with Male Aggress Female & Not Approaching:  0
## Number of remaining cases with Female Aggress Male & Not Approaching:  0
    • Male Aggress Female / Female Aggress Female and Tolerance: I will keep both Tolerance > Male Aggress female/ Female Aggress Male
## Number of cases with Male Aggress Female & Tolerance:  31
## Number of cases with Female Aggress Male & Tolerance:  24
## Number of changes made to remove 'Male aggress female' with 'Tolerance':  31
## Number of changes made to remove 'Female aggress male' with 'Tolerance':  24
## Number of remaining cases with Male Aggress Female & Tolerance:  0
## Number of remaining cases with Female Aggress Male & Tolerance:  0
  • Male aggress Female and Female aggress male: create a variable called Aggression and put Yes for every occurrences of Female and Male aggression and No when there wasn’t
## Number of cases with Male aggress female:  77
## Number of cases with Female aggress male:  22
## Total number of aggression cases:  99
  • Tolerance & Not approaching: I will keep Tolerance as it means that one of the individuals took more than 30 seconds to come at the box and then at the end came. In these cases I will create a variable called Hesistant to count the frequency of times where the monkey took more time to approach than in the usual trials. Once the variable is created based on all the occurence of tolerance and not approaching, I will replace all the occurences of not approaching and tolerance with tolerance: Tolerance > No approaching
## Number of cases with Tolerance & Not Approaching:  30
## Number of rows changed:  30
## Number of lines remaining with both Tolerance and Not Approaching:  0
  • Treatment of mistake of not apporaching not approaching
## Count of 'Not approaching Not approaching':  58
## Count of 'Not approaching Not approaching':  0
## Count of 'Not approaching Not approaching':  0 
## Remaining duplicates of 'Not approaching Not approaching': 0
Table of Unique Keywords and Their Frequencies in DyadResponse
Keyword Frequency
10 Tolerance 1882
8 Not approaching 572
4 Intrusion 83
5 Male aggress female 74
12 Tolerance Intrusion 49
13 Tolerance Not approaching 29
1 Female aggress male 18
3 Female aggress male Not approaching 3
11 Tolerance Intrusion 3
7 Male aggress female Not approaching 2
2 Female aggress male Intrusion 1
6 Male aggress female Intrusion 1
9 Not approaching Intrusion 1
14 Tolerance Not approaching Intrusion 1
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set3 is 12
## Returning the palette you asked for with that many colors

Table of Unique DyadResponse Entries with More Than 10 Occurrences
DyadResponse n
Tolerance 1882
Not approaching 572
Intrusion 83
Male aggress female 74
Tolerance Intrusion 49
Tolerance Not approaching 29
Female aggress male 18

4.Visualizing the data (General Overview)

4.1 Reordering the variables in Bex

  • Now that all the variables have been treated and cleaned I want to modify thew variables in Bex to keep only the ones I need and to put them in the right order

  • First lets print the name of all the variables before reordering them and keeping only the variables of interest

## tibble [2,719 × 37] (S3: tbl_df/tbl/data.frame)
##  $ Time                  : chr [1:2719] "08:18:23" "08:37:12" "08:37:58" "08:16:30" ...
##  $ Date                  : chr [1:2719] "2022-09-29" "2022-09-29" "2022-09-29" "2022-10-04" ...
##  $ Group                 : chr [1:2719] "Ankhase" "Ankhase" "Ankhase" "Ankhase" ...
##  $ MaleID                : chr [1:2719] "Buk" "Buk" "Buk" "Buk" ...
##  $ FemaleID              : chr [1:2719] "Ndaw" "Ndaw" "Ndaw" "Ndaw" ...
##  $ FemaleCorn            : chr [1:2719] "0" "0" "0" "3" ...
##  $ DyadDistance          : chr [1:2719] "5" "5" "5" "5" ...
##  $ DyadResponse          : chr [1:2719] "Tolerance" "Tolerance" "Intrusion" "Intrusion" ...
##  $ OtherResponse         : chr [1:2719] "No Response" "No Response" "No Response" "No Response" ...
##  $ Audience              : chr [1:2719] "Ginq; Gubh" "Ginq; Gubh" "Ginq; Gubh" "Ghid; Gil; Ginq; Gom" ...
##  $ IDIndividual1         : chr [1:2719] "No individual" "No individual" "Buk; Ndaw" "Buk" ...
##  $ IntruderID            : chr [1:2719] "No Intrusion" "No Intrusion" "Ginq; Gubh" "Ginq" ...
##  $ Remarks               : chr [1:2719] "No Remarks" "No Remarks" "Treated" "Treated" ...
##  $ MaleCorn              : chr [1:2719] "0" "0" "0" "7" ...
##  $ Intrusion             : chr [1:2719] "0" "0" "1" "0" ...
##  $ AmountAudience        : chr [1:2719] "2" "2" "2" "4" ...
##  $ Context               : chr [1:2719] "NoContext" "NoContext" "NoContext" "NoContext" ...
##  $ SpecialBehaviour      : chr [1:2719] "NoSpecialBehaviour" "NoSpecialBehaviour" "NoSpecialBehaviour" "NoSpecialBehaviour" ...
##  $ GotCorn               : chr [1:2719] "Yes" "Yes" "Yes" "Yes" ...
##  $ Period                : chr [1:2719] "6 to 8" "6 to 8" "6 to 8" "6 to 8" ...
##  $ Hour                  : chr [1:2719] "08:00:00" "08:00:00" "08:00:00" "08:00:00" ...
##  $ Day                   : chr [1:2719] "16" "16" "16" "21" ...
##  $ Month                 : chr [1:2719] "2022-09" "2022-09" "2022-09" "2022-10" ...
##  $ Male                  : chr [1:2719] "Buk" "Buk" "Buk" "Buk" ...
##  $ Female                : chr [1:2719] "Ndaw" "Ndaw" "Ndaw" "Ndaw" ...
##  $ Dyad                  : chr [1:2719] "Buk Ndaw" "Buk Ndaw" "Buk Ndaw" "Buk Ndaw" ...
##  $ DaysSinceStart        : chr [1:2719] "16" "16" "16" "21" ...
##  $ ExperimentDay         : chr [1:2719] "6" "6" "6" "7" ...
##  $ ExperimentDay_Verified: chr [1:2719] "6" "6" "6" "7" ...
##  $ Trial                 : chr [1:2719] "1" "2" "3" "4" ...
##  $ DyadDay               : chr [1:2719] "1" "1" "1" "2" ...
##  $ TrialDay              : chr [1:2719] "1" "2" "3" "1" ...
##  $ PlacementMale         : chr [1:2719] "0" "0" "0" "7" ...
##  $ PlacementFemale       : chr [1:2719] "0" "0" "0" "3" ...
##  $ Proximity             : chr [1:2719] "4-5" "4-5" "4-5" "4-5" ...
##  $ DyadResponse_sorted   : chr [1:2719] "Tolerance" "Tolerance" "Intrusion;Not approaching" "Intrusion;Losing interest" ...
##  $ MultipleResponses     : chr [1:2719] "Single Response" "Single Response" ">1 Response" ">1 Response" ...
##                   Time                   Date                  Group 
##                      0                      0                      0 
##                 MaleID               FemaleID             FemaleCorn 
##                      0                      0                      0 
##           DyadDistance           DyadResponse          OtherResponse 
##                      0                      0                      0 
##               Audience          IDIndividual1             IntruderID 
##                      0                      0                      0 
##                Remarks               MaleCorn              Intrusion 
##                      0                      0                      0 
##         AmountAudience                Context       SpecialBehaviour 
##                      0                      0                      0 
##                GotCorn                 Period                   Hour 
##                      0                      0                      0 
##                    Day                  Month                   Male 
##                      0                      0                      0 
##                 Female                   Dyad         DaysSinceStart 
##                      0                      0                      0 
##          ExperimentDay ExperimentDay_Verified                  Trial 
##                      0                      0                      0 
##                DyadDay               TrialDay          PlacementMale 
##                      0                      0                      0 
##        PlacementFemale              Proximity    DyadResponse_sorted 
##                      0                      0                      0 
##      MultipleResponses 
##                      0
  • Before reordering these variables I may have to re update them to make sure all the changes that occurred between the creation of this variable and now, did not create any errors or mistakes

  • Finally I will keep the following variables in the next order: Date, Month, Day, Time, Period, Trial, Male, Female, Dyad, DyadDistance, Proximity, DyadResponse, Intrusion, IntruderID SpecialBehaviour, Audience, AmountAudience, Context, Special Behaviour

##  [1] "DaysSinceStart"   "ExperimentDay"    "Date"             "Month"           
##  [5] "DyadDay"          "TrialDay"         "Trial"            "Time"            
##  [9] "Hour"             "Period"           "Male"             "Female"          
## [13] "Dyad"             "DyadDistance"     "Proximity"        "DyadResponse"    
## [17] "IDIndividual1"    "Intrusion"        "IntruderID"       "SpecialBehaviour"
## [21] "Audience"         "AmountAudience"   "Context"

4.2 Exploratory Graph (To organise)

####Dyad, Distance & Date

  • Trials of grpahs, I will have to check all of them

  • My goal here is too see if each dyad have an general evolution of their dyad distance trough time and how many varaition do they have

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 65 rows containing non-finite outside the scale range
## (`stat_smooth()`).

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'

  • Check amount of audience link with aggression occurences per sex

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `KinPresent = ifelse(...)`.
## Caused by warning in `grepl()`:
## ! l'argument pattern a une longueur > 1 et seul le premier élément est utilisé

4.3 Preparation of dataset for importation of life history, elo rating & focal and ad libitum data

4.3.1 Intermediate check of audience and extraction of names

  • Because I will look at the audience and individual factors on the individuals of the dyads i must make a lsit of all the individudals for which I will need to extract information.

  • First lets extract all of the names of individuals in audience

  • Step 1: Clean and Process the Audience Data: We first clean the Audience column, ensuring that any “NA” values are replaced with “No audience” and split the combined names. Whitespace around names is also removed.

  • Step 2: Recalculate AmountAudience: This step ensures the AmountAudience variable is correctly updated after splitting the Audience column.

  • Step 3: Summarize Changes: We count how many times the value “NA” was replaced with “No audience” and display this count using cat().

  • Step 4: Categorize Audience Members: We categorize audience members into “Male”, “Female”, and other relevant categories based on the length of their IDs. Specific identifiers like “unkam”, “unkaf”, etc., are given appropriate labels.

Step 5: Display Tables by Category: We split the audience count table into sections for “Male”, “Female”, and other categories, and then display these tables separately. The use of knitr::kable() provides a professional table format that is more suitable for RMarkdown.

## 
## **Table: Top 5 Male Audience Members**
Top 5 Male Audience Members
Audience Member Count
Pix 177
Oup 165
Nko 123
Gom 103
Ome 103
## 
## **Table: Top 5 Female Audience Members**
Top 5 Female Audience Members
Audience Member Count
Sirk 241
Oort 199
Oerw 158
Piep 154
Ginq 145
## 
## **List: All Male Audience Members**
## Pix, Oup, Nko, Gom, Ome, Sey, Nda, Ndl, Gha, Xia, Sig, Xin, Rid, Gib, Gub, Mui, Kom, Dix, Nak, Buk, Aan, Non, Aal, Sho, Guz, Pie, Hee, Pom, Syl, Nuk, App, Gri, Ott, Ree, Gil, Kno, Xar, Goe, Ask, UnkAM, Nuu, Ros, Vla, Xop, Bet, Roc, Hem, Nge, Eis, Umb, Ren, Rim, War, Atj, Gua, Vul, Bob, Nca, Tot, Her, Ram, Dal, Tch, Ris, Vry, Dak, Uls, Flu, Gab, Win
## 
## **List: All Female Audience Members**
## Sirk, Oort, Oerw, Piep, Ginq, Obse, Ghid, Reen, Gubh, Ndaw, Sitr, Godu, Skem, Ndum, Naal, Ncok, Miel, Gobe, Nkos, Ouli, Ndon, Ndin, Puol, Giji, Lewe, Enge, Papp, Eina, Hond, Rimp, Heer, Sari, Gree, Olyf, Bela, Aapi, Guba, Popp, Oase, Xeni, Nurk, Haai, Rivi, Gran, Misk, Nooi, Prat, Gris, Pikk, Pann, Prai, Riss, Riva, Ekse, Rede, Griv, Regi, UnkJ, Xati, Rooi, Gaya, Prim, Rafa, Udup, Xala, Palm, Xinp, Guat, Reno, Beir, Gese, Grim, UnkA, Gale, Pret, Prag, Prui, Raba, Rioj, UnkAF, Utic

## 
## ### Top 10 Audience Members for Dyad: Buk Ndaw 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |Ginq            |    90|
## |Ghid            |    62|
## |No audience     |    59|
## |Nda             |    47|
## |Ndl             |    42|
## |Sho             |    40|
## |Gha             |    37|
## |Gom             |    36|
## |Gubh            |    35|
## |Godu            |    29|
## 
## ### Top 10 Audience Members for Dyad: Kom Oort 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |No audience     |    95|
## |Sirk            |    51|
## |Xia             |    50|
## |Piep            |    47|
## |Reen            |    35|
## |Pix             |    29|
## |Sey             |    29|
## |Ome             |    25|
## |Oup             |    22|
## |Xin             |    21|
## 
## ### Top 10 Audience Members for Dyad: Nge Oerw 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |No audience     |    55|
## |Sirk            |    54|
## |Oup             |    40|
## |Pix             |    28|
## |Sey             |    19|
## |Sig             |    18|
## |Obse            |    16|
## |Sitr            |    13|
## |Ouli            |    12|
## |Aan             |    11|
## 
## ### Top 10 Audience Members for Dyad: Pom Xian 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |No audience     |   115|
## |Gri             |    31|
## |Gree            |    26|
## |Xar             |    23|
## |Xeni            |    21|
## |Gran            |    18|
## |Xop             |    17|
## |Prat            |    16|
## |Gris            |    15|
## |Miel            |    15|
## |Roc             |    15|
## 
## ### Top 10 Audience Members for Dyad: Sey Sirk 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |No audience     |   250|
## |Piep            |    63|
## |Oort            |    51|
## |Oerw            |    48|
## |Pix             |    48|
## |Oup             |    44|
## |Ome             |    39|
## |Reen            |    37|
## |Obse            |    34|
## |Sitr            |    29|
## 
## ### Top 10 Audience Members for Dyad: Sho Ginq 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |Ndaw            |    90|
## |Gom             |    67|
## |Gubh            |    61|
## |No audience     |    59|
## |Gha             |    54|
## |Ghid            |    49|
## |Buk             |    48|
## |Godu            |    46|
## |Gib             |    42|
## |Ndum            |    37|
## 
## ### Top 10 Audience Members for Dyad: Xia Piep 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |No audience     |   178|
## |Sirk            |   124|
## |Oort            |    90|
## |Pix             |    56|
## |Oerw            |    51|
## |Xin             |    37|
## |Naal            |    32|
## |Sitr            |    32|
## |Obse            |    31|
## |Dix             |    29|
## 
## ### Top 10 Audience Members for Dyad: Xin Ouli 
## 
## 
## |Audience Member | Count|
## |:---------------|-----:|
## |Oerw            |    45|
## |Oort            |    39|
## |Sey             |    38|
## |No audience     |    34|
## |Oup             |    30|
## |Piep            |    29|
## |Aal             |    17|
## |Xia             |    17|
## |Obse            |    16|
## |Ott             |    16|
## |Pix             |    16|

4.3.2 Intermediate check of Intrusion, extraction of names

## [1] "Ordered Unique Intruder Names:"
##  [1] "Buk"  "Ghid" "Ginq" "Godu" "Gran" "Gree" "Gri"  "Grif" "Gris" "Guat"
## [11] "Gub"  "Gubh" "Guz"  "Hee"  "Kno"  "Kom"  "Nak"  "Nda"  "Nge"  "Non" 
## [21] "Obse" "Oerw" "Oort" "Ouli" "Oup"  "Piep" "Pix"  "Sey"  "Sho"  "Sirk"
## [31] "Xia"  "Xin"  "Xop"
## [1] "Top 10 Most Frequent Intruder Names:"
##    Name Count Gender
## 1  Oerw    23 Female
## 2   Sey    19   Male
## 3  Oort    15 Female
## 4  Obse    10 Female
## 5  Ghid     9 Female
## 6   Buk     5   Male
## 7  Ginq     5 Female
## 8  Gubh     5 Female
## 9   Gri     4   Male
## 10 Piep     4 Female
## [1] "Male Names (3 letters):"
##  [1] "Guz" "Nda" "Sho" "Sey" "Pix" "Xia" "Oup" "Nak" "Gri" "Xop" "Nge" "Kno"
## [13] "Kom" "Buk" "Gub" "Xin" "Hee" "Non"
## [1] "Female Names (4 letters):"
##  [1] "Ginq" "Gubh" "Ghid" "Oerw" "Obse" "Piep" "Ouli" "Gree" "Gran" "Gris"
## [11] "Grif" "Guat" "Sirk" "Oort" "Godu"
  • After comparaison every individual that appearead in intrusion was also in audience except for Grif that i may add to the lsit of invididuals of audience before doing elo rating calculations

4.3.3 Intermediate check of Not Approaching behaviour$

4.3.4 Intermediate check of Intrusion

## `summarise()` has grouped output by 'Dyad'. You can override using the
## `.groups` argument.

4.3.5 Evolution of DyadResponse per Dyad

Behaviours tests

## `summarise()` has grouped output by 'Dyad'. You can override using the
## `.groups` argument.

5.Checkpoint - Export Data as Rds and Xlslx

6 Importing data for new varaibles (Age, Elo Rating, DSI)

  • I want to investigate how intra-dyadic differences can explain inter-dyadic differences on tolerances levels during the box experiment
  • For that I will calculate the age of individuals , their rank using elo rating, and their social bond using the dyadic composite social index
  • I will use data files from Inkawu Vervet Project (IVP) that contain longterm data as on age, agonistic and affiliative ineractions as for proximity data amon other.

6.1. Assessing Age of Individuals

  • Female DOB: Use the direct DOB from LH, except for Ouli.
  • Male DOB: Use FirstRecorded - 4 years unless they have a recorded DOB.
  • Age Calculation: We’ll calculate the ages for both using the DOBs or estimated DOBs.

6.1.2 Age of Females

##   Code        DOB FirstRecorded AdjustedDOB       Age
## 1 Ouli       <NA>    2010-11-09  2006-11-09 17.960670
## 2 Xian 2012-11-05    2012-11-05  2012-11-05 11.970129
## 3 Piep 2012-01-01    2013-01-18  2012-01-01 12.816143
## 4 Ginq 2014-10-18    2014-10-18  2014-10-18 10.020740
## 5 Oort 2015-11-20    2015-11-20  2015-11-20  8.931053
## 6 Ndaw 2016-02-08    2016-02-08  2016-02-08  8.712020
## 7 Sirk 2017-10-21    2017-10-21  2017-10-21  7.011780
## 8 Oerw 2018-11-09    2018-11-09  2018-11-09  5.960424

6.1.3 Age of Male

  • Since most males dispsersed from unkonwn groups we dont have their date of birth. But male usually dispers around 5 years old.

  • Male DOB: Use FirstRecorded - 5 years, except for a few individuals who have DOB recorded (like Xia and Xin)

  • Use DOB directly if it exists.

  • Subtract 5 years from FirstRecorded if DOB is missing.

  • Calculate age based on AdjustedDOB.

  • Output a table with the relevant columns and create a graph * similar to the one you had for the females.

  • NOTE FOR UPDATE, the individual with the code Buk, short for BukuBuku died on the 12 of september 2024

##   Code        DOB FirstRecorded AdjustedDOB       Age
## 1  Sey 2014-01-01    2014-12-31  2014-01-01 10.814733
## 2  Xia 2016-11-14    2016-11-14  2016-11-14  7.945406
## 3  Nge 2016-11-18    2016-11-18  2016-11-18  7.934455
## 4  Kom       <NA>    2017-09-04  2012-09-04 12.139880
## 5  Pom 2017-10-19    2017-10-19  2017-10-19  7.017256
## 6  Xin 2017-01-01    2017-10-20  2017-01-01  7.813987
## 7  Sho       <NA>    2020-10-13  2015-10-13  9.035093
## 8  Buk       <NA>    2021-05-11  2016-05-11  8.457395

6.1.4 Dyadic Age Difference

  • Age Comparison: Calculate the age for both males and females based on their estimated or recorded DOB.
  • Lets explore male femalée ages before calculating differences between dyads
##    Code        DOB FirstRecorded AdjustedDOB       Age Gender
## 1  Ouli       <NA>    2010-11-09  2006-11-09 17.960670 Female
## 2  Xian 2012-11-05    2012-11-05  2012-11-05 11.970129 Female
## 3  Piep 2012-01-01    2013-01-18  2012-01-01 12.816143 Female
## 4  Ginq 2014-10-18    2014-10-18  2014-10-18 10.020740 Female
## 5  Oort 2015-11-20    2015-11-20  2015-11-20  8.931053 Female
## 6  Ndaw 2016-02-08    2016-02-08  2016-02-08  8.712020 Female
## 7  Sirk 2017-10-21    2017-10-21  2017-10-21  7.011780 Female
## 8  Oerw 2018-11-09    2018-11-09  2018-11-09  5.960424 Female
## 9   Sey 2014-01-01    2014-12-31  2014-01-01 10.814733   Male
## 10  Xia 2016-11-14    2016-11-14  2016-11-14  7.945406   Male
## 11  Nge 2016-11-18    2016-11-18  2016-11-18  7.934455   Male
## 12  Kom       <NA>    2017-09-04  2012-09-04 12.139880   Male
## 13  Pom 2017-10-19    2017-10-19  2017-10-19  7.017256   Male
## 14  Xin 2017-01-01    2017-10-20  2017-01-01  7.813987   Male
## 15  Sho       <NA>    2020-10-13  2015-10-13  9.035093   Male
## 16  Buk       <NA>    2021-05-11  2016-05-11  8.457395   Male

6.1.5 Absolute Age Difference

  • We are going to calculate the raw differnce of agin between each male and female in the dyad displaying the absolute age difference ordered

  • Before that i will create a variable called DyadData with information the each dyad

## # A tibble: 8 × 6
##   Dyad     Male  Female MaleAge FemaleAge AgeDifference
##   <chr>    <chr> <chr>    <dbl>     <dbl>         <dbl>
## 1 Sey Sirk Sey   Sirk     10.8       7.01         3.80 
## 2 Xia Piep Xia   Piep      7.95     12.8          4.87 
## 3 Nge Oerw Nge   Oerw      7.93      5.96         1.97 
## 4 Sho Ginq Sho   Ginq      9.04     10.0          0.986
## 5 Xin Ouli Xin   Ouli      7.81     18.0         10.1  
## 6 Buk Ndaw Buk   Ndaw      8.46      8.71         0.255
## 7 Kom Oort Kom   Oort     12.1       8.93         3.21 
## 8 Pom Xian Pom   Xian      7.02     12.0          4.95

6.1.6 Relative Age Difference

  • (Check if necessary) We are going to normalize the average age of to ID in a dyad

6.1.7 In order to compare the age difference accross dyads we are going to calculate the z-scores of the aboslute age

6.1.7 Age direction

  • For further analysis we may consider if the male or female was older
## # A tibble: 8 × 4
##   Male  Female AgeDifference AgeDirection
##   <chr> <chr>          <dbl> <chr>       
## 1 Sey   Sirk           3.80  Male Older  
## 2 Xia   Piep           4.87  Female Older
## 3 Nge   Oerw           1.97  Male Older  
## 4 Sho   Ginq           0.986 Female Older
## 5 Xin   Ouli          10.1   Female Older
## 6 Buk   Ndaw           0.255 Female Older
## 7 Kom   Oort           3.21  Male Older  
## 8 Pom   Xian           4.95  Female Older

6.1.8 Age Hypothesis

  • A. Age Hypothesis: We expect that the higher the age difference in a dyad, the less likely they are to reach tolerance compared to dyad with a smaller difference. In addition we will check if there is an effect of Age Direction (Wether the male or the female is older)

6.1.8.1 Creation of Binomial tolerance

  • In order to check the effect of age on tolerance I will create a dichotomic variable displaying 1 if there is tolerance and 0 if there is not tolerance
## # A tibble: 8 × 6
##   Dyad     TotalTrials ToleranceCount NoToleranceCount ToleranceProportion
##   <chr>          <int>          <dbl>            <dbl>               <dbl>
## 1 Buk Ndaw         258            145              113               0.562
## 2 Kom Oort         367            305               62               0.831
## 3 Nge Oerw         182            121               61               0.665
## 4 Pom Xian         257            171               86               0.665
## 5 Sey Sirk         586            401              185               0.684
## 6 Sho Ginq         281            158              123               0.562
## 7 Xia Piep         603            473              130               0.784
## 8 Xin Ouli         185            108               77               0.584
## # ℹ 1 more variable: NoToleranceProportion <dbl>

## # A tibble: 8 × 10
##   Dyad     Male  Female AgeDifference AgeDirection TotalTrials ToleranceCount
##   <chr>    <chr> <chr>          <dbl> <chr>              <int>          <dbl>
## 1 Sey Sirk Sey   Sirk           3.80  Male Older           586            401
## 2 Xia Piep Xia   Piep           4.87  Female Older         603            473
## 3 Nge Oerw Nge   Oerw           1.97  Male Older           182            121
## 4 Sho Ginq Sho   Ginq           0.986 Female Older         281            158
## 5 Xin Ouli Xin   Ouli          10.1   Female Older         185            108
## 6 Buk Ndaw Buk   Ndaw           0.255 Female Older         258            145
## 7 Kom Oort Kom   Oort           3.21  Male Older           367            305
## 8 Pom Xian Pom   Xian           4.95  Female Older         257            171
## # ℹ 3 more variables: NoToleranceCount <dbl>, ToleranceProportion <dbl>,
## #   NoToleranceProportion <dbl>

6.1.9 Age & Tolerance

  • I want to test wether age can predict tolerance and if there is an effect of age direction. In addition maybe track the effect of both Age difference and direction on tolerance

6.1.9.1

  • Model 1: Absolute age difference ~ tolerance
## `geom_smooth()` using formula = 'y ~ x'

## 
##  Pearson's product-moment correlation
## 
## data:  dyad_summary$AgeDifference and dyad_summary$ToleranceProportion
## t = 0.19274, df = 6, p-value = 0.8535
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6628703  0.7420961
## sample estimates:
##        cor 
## 0.07844459
## 
## Call:
## lm(formula = ToleranceProportion ~ AgeDifference, data = dyad_summary)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.099630 -0.096714 -0.001366  0.041322  0.165240 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.657686   0.062557  10.513 4.35e-05 ***
## AgeDifference 0.002536   0.013155   0.193    0.854    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1076 on 6 degrees of freedom
## Multiple R-squared:  0.006154,   Adjusted R-squared:  -0.1595 
## F-statistic: 0.03715 on 1 and 6 DF,  p-value: 0.8535
  • We are going to conduct a Spearman’s rank correlation because we don’t have a normal distribution and this methods considers the order of the data

  • Comparing with Previous Results Earlier Pearson’s Correlation: Correlation Coefficient (r): Approximately 0.0784 P-value: 0.8535 Interpretation: No significant linear relationship. Spearman’s Correlation: Correlation Coefficient (rho): 0.4286 P-value: 0.2992 Interpretation: Suggests a moderate monotonic relationship but not statistically significant. Why the Difference? Pearson’s Correlation: Measures linear relationships. Sensitive to outliers and requires normally distributed variables. Spearman’s Correlation: Measures monotonic relationships. Less sensitive to outliers and does not require normal distribution.

## 
##  Spearman's rank correlation rho
## 
## data:  dyad_summary$AgeDifference and dyad_summary$ToleranceProportion
## S = 48, p-value = 0.2992
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4285714

## `geom_smooth()` using formula = 'y ~ x'

  • Using a Spearman correlation we found a spearmans correlation coefficient (rho) which is equal to 0,4285714 indicating a moderate positive association between ranks of age difference and tolerance proportion. It seems that as the age difference increases, the tolerance proportion increases

  • But looking at the p-value of 0,2992 we can see that the correlation is not statistically significant and there is a 29,92% probability that the observed correlations occured by chance

  • X-axis: AgeDifference

  • Y-axis:

  • The correlation coefficient (0.0784) is very weak

  • Also the p.value, 0.8535 is not significant

  • It seems that they are not significant correlation between age difference and tolerance proportion

6.1.9.2

  • Model 2: Age direction ~ tolerance
## `geom_smooth()` using formula = 'y ~ x'

## 
##  Welch Two Sample t-test
## 
## data:  ToleranceProportion by AgeDirection
## t = -1.4071, df = 4.5304, p-value = 0.2242
## alternative hypothesis: true difference in means between group Female Older and group Male Older is not equal to 0
## 95 percent confidence interval:
##  -0.27457658  0.08425425
## sample estimates:
## mean in group Female Older   mean in group Male Older 
##                  0.6315716                  0.7267327

*p-value of 0.2242 suggests that there is no significant difference in tolerance proportions between dyads where the male is older and those where the female is older at the conventional alpha level of 0.05.

6.1.9.3

  • Model 3: Absolute age + Age direction ~ tolerance

  • First lets do a regression model

## 
## Call:
## lm(formula = ToleranceProportion ~ AgeDifference + AgeDirection, 
##     data = dyad_summary)
## 
## Residuals:
##        1        2        3        4        5        6        7        8 
## -0.04738  0.14899 -0.05564 -0.04934 -0.08396 -0.04513  0.10302  0.02944 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.605582   0.069737   8.684 0.000335 ***
## AgeDifference          0.006127   0.012566   0.488 0.646498    
## AgeDirectionMale Older 0.102800   0.075076   1.369 0.229212    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1005 on 5 degrees of freedom
## Multiple R-squared:  0.2772, Adjusted R-squared:  -0.01193 
## F-statistic: 0.9588 on 2 and 5 DF,  p-value: 0.4442

## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = ToleranceProportion ~ AgeDifference * AgeDirection, 
##     data = dyad_summary)
## 
## Residuals:
##        1        2        3        4        5        6        7        8 
## -0.06620  0.14940 -0.03185 -0.05146 -0.08012 -0.04772  0.09805  0.02991 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                          0.608345   0.077844   7.815  0.00145 **
## AgeDifference                        0.005475   0.014107   0.388  0.71769   
## AgeDirectionMale Older               0.030261   0.272127   0.111  0.91681   
## AgeDifference:AgeDirectionMale Older 0.023947   0.085541   0.280  0.79340   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1113 on 4 degrees of freedom
## Multiple R-squared:  0.2911, Adjusted R-squared:  -0.2406 
## F-statistic: 0.5475 on 3 and 4 DF,  p-value: 0.676
## `geom_smooth()` using formula = 'y ~ x'

  • Check of model assumptiuons: Residuals vs. Fitted: Check for linearity and homoscedasticity. Normal Q-Q Plot: Assess normality of residuals. Scale-Location Plot: Check for homoscedasticity. Residuals vs. Leverage: Identify influential observations.

6.1.9.4 Age Conclusion

  • It seems that both age difference and age direction within dyads re not significant and do not predict tolerance rates differences
  • This may come from the very small sample but we will try to explain tolerance with other factors as rank and intial social bond

6.1.9.5 Age Class for Elo Rating Calculations

6.2 Elo Rating

  • In order to calculate the Elo Rating, I will have to create different files

    1. FinalAgonistic using the files Agonistic2016-2023.csv, Agonistic.csv and Focal.csv
    2. WinnerLoser using the newly created FinalAgonistic.csv and IVP Life history_180424.csv
    3. Elo rating using presence matrices from the files AK2020-2024.csv, BD2020-2024.csv, KB2020-2024.csv, NH2020-2024.csv and ** WinnerLoser.csv**
  • As males and females hierarchy are distinct I will have to calculate their hierarchies separately

  • Speciffically for these codes I may have seqcheck() and elo.seq() functions errors as: -First interaction occurred before presence (approx) meaning that there’s already an interaction recorded before the male was added into that group in the life history

    • In these cases I will have to either change the presence file manually or get rid of interactions before his first presence date.

6.2.1 Creation of Final Agonistic

  • I will now create FinalAgonistic.csv using Agonistic data from 2016 to 2023 and maybe the latest agonistic file until May 2023. Also I Focal data from June 2022 until the latest focals (date to check)

  • FinalAgonistic.csv will combine the above input files after filtering one-on-one interactions (excluding support interactions). It will involve cleaning and merging the raw data from agonistic interactions and focals to prepare it for the winner-loser calculations.

  • I will use the information of each first experiment day to know when to stop the elo calculations. For each dyad I will take the beggining of the month of the experiment.

  • BD1 > SEPT 2022

    1. Sey Sirk : 14.09.2022
    2. Xia Piep : 16.09.2022
    3. Nge Oerw : 22.09.2022
    4. Xin Ouli : 27.09.2022
  • AK > SEPT 2022

    1. Sho Ginq : 27.09.2022
    2. Buk Ndaw : 29.09.2022
  • BD2 > DEC 2022

    1. Kom Oort : 12.12.2022
  • NH > MAR 2023

    1. Pom Xian : 10.03.2023
  • Also, because the lastest dyad starts it’s first trial after the 10.03.2023 I will remove all the elo data after this date

  • Note that male and female hierarchy are not comparable on the same scale in vervet monkeys so I will have to conduct 8 Elo Ration calcuations

    1. BD1, Will calculate elo for all females in BD group so we can extract the ones of Sirk, Piep, Oerw, Ouli while for males we will look at the rank of Sey, Xia, Nge, XinI will calculate their elo using the presence data and winner looser data until the 13.10.2022
    2. BD2 will calculate again elo for males and females using the data untill the 11.12.2022 to extract the elo of Kom and Oort before their first trial day
    3. For AK, the elo will be calculated in once using the data untill the 26.09.2022 to then extract rank for females Ginq, Ndaw and males Buk, Sho
    4. Last Elo calculations for males and females in NH will use data until the 09.03.23 to extract rank of female Xian and MalePom ### 6.2 Female Elo Ratings

6.3 Male Elo Ratings

6.3 DSI - Dyadic composite Sociality Index

6.4 (Does tolerance predict distance?)

  • Thinking about
  1. check if tolerance predict distance
  2. If not distance of tolerance is predicted by other factors
  3. Check whihc predictors explain better tolerance from
    1. Age difference
    2. Rank difference
    3. Initial social bond
  4. Additional check to see if we can improve the models using
    1. male tenure
    2. female pregnancy
  • First we are going to investigate the relationship between distance and tolerance proportion

  • Summary and Bar chart for Dyad distance and tolerance binomial

## `summarise()` has grouped output by 'DyadDistance'. You can override using the
## `.groups` argument.

## 
##  Shapiro-Wilk normality test
## 
## data:  BexClean$DyadDistance
## W = 0.85818, p-value < 2.2e-16
## 
##  Shapiro-Wilk normality test
## 
## data:  BexClean$ToleranceBinomial
## W = 0.58055, p-value < 2.2e-16
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  DyadDistance by ToleranceBinomial
## W = 869192, p-value = 8.901e-06
## alternative hypothesis: true location shift is not equal to 0
## 
##  Welch Two Sample t-test
## 
## data:  DyadDistance by ToleranceBinomial
## t = 4.5617, df = 1474.5, p-value = 5.494e-06
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.1868450 0.4687605
## sample estimates:
## mean in group 0 mean in group 1 
##        1.860215        1.532412

tep 7: Reporting the Results In your write-up, you would report something like this:

Normality assumption: Both DyadDistance and ToleranceBinomial were found to be non-normally distributed based on the Shapiro-Wilk test (p-values < 0.05). Therefore, we proceeded with non-parametric statistical methods.

Wilcoxon Rank Sum Test: The Wilcoxon test revealed a statistically significant difference in the distance between dyads that exhibited tolerance (p-value = 8.901e-06). This suggests that dyads that were tolerant were, on average, closer than those that were not.

t-test for comparison: A t-test supported the Wilcoxon test’s findings, further demonstrating that mean distances were significantly different (p-value = 5.494e-06).

  • lets now inlcude dyad as a factor of explnation to see if distance and otlerance differ per dyad
## `summarise()` has grouped output by 'Dyad', 'DyadDistance'. You can override
## using the `.groups` argument.

## Le chargement a nécessité le package : carData
## 
## Attachement du package : 'car'
## L'objet suivant est masqué depuis 'package:dplyr':
## 
##     recode
## 
##  Shapiro-Wilk normality test
## 
## data:  BexClean$DyadDistance
## W = 0.85818, p-value < 2.2e-16
## 
## Call:
## glm(formula = ToleranceBinomial ~ DyadDistance * Dyad, family = "binomial", 
##     data = BexClean)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0219  -1.2745   0.6983   0.9295   1.3036  
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                0.499924   0.221339   2.259 0.023906 *  
## DyadDistance              -0.105085   0.075864  -1.385 0.165997    
## DyadKom Oort               1.384136   0.281964   4.909 9.16e-07 ***
## DyadNge Oerw              -0.382906   0.321745  -1.190 0.234010    
## DyadPom Xian               0.228396   0.302935   0.754 0.450882    
## DyadSey Sirk              -0.222303   0.263713  -0.843 0.399245    
## DyadSho Ginq               0.179562   0.334319   0.537 0.591201    
## DyadXia Piep               0.787217   0.250780   3.139 0.001695 ** 
## DyadXin Ouli              -0.005553   0.335118  -0.017 0.986779    
## DyadDistance:DyadKom Oort -0.224558   0.124341  -1.806 0.070921 .  
## DyadDistance:DyadNge Oerw  0.576587   0.173023   3.332 0.000861 ***
## DyadDistance:DyadPom Xian  0.088974   0.098107   0.907 0.364459    
## DyadDistance:DyadSey Sirk  0.430648   0.109089   3.948 7.89e-05 ***
## DyadDistance:DyadSho Ginq -0.056796   0.111836  -0.508 0.611554    
## DyadDistance:DyadXia Piep  0.111038   0.115011   0.965 0.334316    
## DyadDistance:DyadXin Ouli  0.051274   0.102832   0.499 0.618046    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3357.2  on 2718  degrees of freedom
## Residual deviance: 3196.1  on 2703  degrees of freedom
## AIC: 3228.1
## 
## Number of Fisher Scoring iterations: 4
## `geom_smooth()` using formula = 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -0.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 8.5375e-30
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -0.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 5.5579e-16
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -0.03
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.03
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 3.501e-15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -0.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1.025
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 2.4783e-30
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1

## `geom_smooth()` using formula = 'y ~ x'

#6974 Dyad and Tolerance

##             Df  Sum Sq  Mean Sq
## Dyad         7 0.06992 0.009989
## 
##  Kruskal-Wallis rank sum test
## 
## data:  ToleranceProportion by Dyad
## Kruskal-Wallis chi-squared = 7, df = 7, p-value = 0.4289

7. EDA TESTING ; EXPLORATION and TESTS

Step-by-Step Approach 1. Visualizing Behavioral Trends Over Time We’ll start by analyzing how key behaviors (like tolerance, male aggression, female aggression, and not approaching) evolve over time.

Goal: Understand if there’s a consistent trend over time, indicating factors like seasonality, repeated interactions, or changes in dyad dynamics. Approach: We’ll create time-series plots to visualize how these behaviors change across trials and experiment days. 2. Analyzing Behavioral Trends by Dyad Each dyad may exhibit different patterns. By separating the trends by dyad, we can:

Goal: Identify whether some dyads are consistently tolerant, aggressive, or show variations depending on conditions like audience or time. Approach: Create faceted plots, where each facet represents a different dyad, showing the evolution of behaviors like distance, tolerance, and aggression. 3. Investigating the Impact of Audience Understanding how audience composition influences behaviors like tolerance or aggression is crucial.

Goal: Determine if audience types (e.g., kin vs. non-kin) have a statistically significant effect on behaviors. Approach: Use boxplots and bar charts to show the relationship between audience composition (kin vs. non-kin) and behaviors. 4. Postulate and Refine Hypotheses After the initial EDA, we’ll refine the hypotheses and decide on the most suitable statistical models.

Examples of Hypotheses: Tolerance is associated with shorter dyad distances over time. Audience presence (kin vs. non-kin) influences the likelihood of aggression. 5. Preparing for Mixed-Effects Models Once the EDA provides insights, we can set up mixed-effects models that account for:

Repeated measures within each dyad. Random effects to capture variability across dyads or trials. 6. Outlier Detection and Analysis Outliers might distort the analysis, so we’ll need to:

Goal: Identify and handle influential outliers that could bias our models. Approach: Use boxplots and statistical diagnostics (e.g., Cook’s distance) to detect outliers.

## Le chargement a nécessité le package : zoo
## 
## Attachement du package : 'zoo'
## Les objets suivants sont masqués depuis 'package:base':
## 
##     as.Date, as.Date.numeric
## Le chargement a nécessité le package : sna
## Le chargement a nécessité le package : statnet.common
## 
## Attachement du package : 'statnet.common'
## Les objets suivants sont masqués depuis 'package:base':
## 
##     attr, order
## Le chargement a nécessité le package : network
## 
## 'network' 1.18.2 (2023-12-04), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information
## sna: Tools for Social Network Analysis
## Version 2.7-2 created on 2023-12-05.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
## 
## Attachement du package : 'data.table'
## Les objets suivants sont masqués depuis 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## Les objets suivants sont masqués depuis 'package:dplyr':
## 
##     between, first, last
## [1] "2020-01-02" "2024-04-15"
## [1] "2020-01-01" "2024-04-20"
## [1] "2020-01-01" "2024-04-20"
## [1] "2020-01-01" "2024-04-20"
## [1] "2022-03-01" "2023-10-01"
## [1] "2022-03-01" "2023-10-01"
## [1] "2022-03-01" "2023-10-01"
## [1] "2022-03-01" "2023-10-01"
## Presence data supplied, see below for details
## Everything seems to be fine with the interaction sequence...OK
## 
## #####################################
## 
## Presence starts earlier than data...WARNING
## Presence continues beyond data...WARNING
## IDs in datasequence and presence do not match!
## The following IDs occur in the presence data but NOT in the data sequence:...WARNING
##     Ati, BBGubh23, BBNdaw22, BBNkos23, BGug19, BNda21, BNya20, Buk, Gha, Ghi, Gib, Giji, Gil, Gobe, Gom, Gon, Gub, Gugu, Gus, Guz, Hlu, Kek, Mat, Nak, Nca, Nci, Nda, Ndi, Ndik, Ndin, Ndl, Ndo, Ndum, Nge, Nko, Nkun, Nyal, Nyan, Sho, Tch, Twe, Vla, Yan 
## 
## #####################################
## Presence data supplied, see below for details
## Everything seems to be fine with the interaction sequence...OK
## 
## #####################################
## 
## Presence starts earlier than data...WARNING
## Presence continues beyond data...WARNING
## IDs in datasequence and presence do not match!
## The following IDs occur in the presence data but NOT in the data sequence:...WARNING
##     Aal, Aan, Add, Apa, App, Ard, Ask, Atj, Bas, BBDian22, BBNaal23, BBSari23, BBSkem23, BHee19, Bob, BOer21, Boo, BOul21, Bra, BSie16, Dal, Dix, Dok, Dri, Eie, Eis, Ekse, Ela, Fen, Flu, Glo, Goe, Haai, Han, Hee, Heli, Hem, Her, Hia, Hibi, Hipp, Hot, Kno, Kom, Mat, Mimi, Misk, Mui, Naa, Naga, Nak, Nami, Neu, Nge, Non, Noo, Nucl, Nuk, Nul, Nuu, Oase, Olyf, Ome, Ott, Ouma, Oup, Padk, Pal, Papp, Pepe, Pie, Pikk, Pix, PlainJane, Pom, Popp, Potj, Pro, Prui, Pur, Rat, Ree, Reno, Rhe, Rid, Rimp, Rivi, Rooi, Ros, Samp, Sey, Siel, Sig, Sil, Sitr, Sla, Span, Spe, Syl, Ted, Tot, Tow, Ubu, Umb, War, Win, Xia, Xin, Xiu 
## 
## #####################################
## Presence data supplied, see below for details
## Everything seems to be fine with the interaction sequence...OK
## 
## #####################################
## 
## Presence continues beyond data...WARNING
## IDs in datasequence and presence do not match!
## The following IDs occur in the presence data but NOT in the data sequence:...WARNING
##     Bang, BBGran22, BBGran23, BBGree23, BBOort22, BBRegi23, BBXala23, BBXian22, BBXimp23, Bet, BGua20, BGua21, BPre21, BRen20, BRos20, Cai, Can, Cus, Dak, Fle, Gab, Gale, Gan, Gree, Gri, Grim, Griv, Gua, Hav, Kek, Kny, Lif, Lima, Lip, Lug, NewMale3, Oua, Palm, Pom, Prag, Pri, Prim, Pru, Pye, Rafa, Ram, Ren, Renn, Reva, Rey, Rim, Ris, Riv, Riva, Roc, Roes, Roma, Rosl, Sio, Tam, Tch, Udi, Udup, Ula, Uls, Umt, Utic, Utr, Vla, Vry, Vul, War, Was, Xal, Xar, Xati, Xeni, Xia, Xih, Xin, Xop, Yan 
## 
## #####################################
## [1] "Ndaw" "Ginq" "Ghid" "Gubh" "Godu" "Ncok" "Ndon" "Guba" "Nkos"

##  Ginq  Godu  Ghid  Ncok  Guba  Gubh  Ndon  Ndaw  Nkos 
## 0.912 0.548 0.519 0.447 0.444 0.412 0.365 0.312 0.075
##  Obse  Oort  Ouli  Puol  Aapi  Sirk  Miel  Asis  Piep  Skem  Heer  Reen  Oerw 
## 0.917 0.684 0.679 0.576 0.557 0.463 0.463 0.457 0.421 0.417 0.409 0.405 0.404 
##  Lewe  Naal  Rede  Hond  Numb  Nooi  Gese  Sari  Riss  Enge  Pann  Nurk  Eina 
## 0.399 0.386 0.385 0.366 0.355 0.314 0.272 0.240 0.233 0.231 0.207 0.154 0.133

##  Gran  Guat  Prai  Upps  Gaya  Xala  Pret  Xinp  Gris  Beir  Prat  Regi  Xian 
## 0.893 0.659 0.526 0.474 0.461 0.454 0.449 0.447 0.447 0.447 0.442 0.433 0.416 
##  Bela  Raba  Rioj 
## 0.285 0.239 0.076

## [1] "Sho" "Vla" "Buk"

##   Buk   Vla   Sho 
## 0.934 0.484 0.072

TO SORT; NOT FINISHED

  • I want to use the date to know how many sessions have been done with each dyads in my experiment. * I will create a variable called Session where 1 session = 1 day * The data has values from the 14th of September 2022 until the 13th of September 2023 * I will also create a variable called Trial to know how many trials have been done with each dyad where 1 row = 1 trial
* I may consider, in parallel of my hypothesis, to separate the data in *4 seasons* to make a preliminary check of a potential effect of seasonality. Nevertheless the fact that we did not use anywithout      tools to mesure the weather and the idea to make a categorization in 4 seasons without considering the actua quite arbitrary. I may do it but with no intention to include this in my scientific report.
l temperature, food quantitiy and other elements related to seasonailty make this categorizationn a categorization where 12 months of          data will be separated in 4 categories
  • But before I may want to make a few changes already by merging Male corn and Male placement corn into ” Male corn” and maybe replacing all of the NA’s in “Other response” by response

#Lines to check unique values in MaleFemaleID to see if they are any problems with it # Unique values in MaleID unique_male_ids <- unique(BexClean$MaleID)

Unique values in FemaleID

unique_female_ids <- unique(Bex$FemaleID)

Sections below are here for the organization of my paper and will be worked on once the data cleaning and exploration is done

5. Describing the data

5.Research question & Hypothesis

Research question

  • What factors influence the rate at which individuals (vervets) learn to tolerate each other in a controlled box experiment?

  • Ex: The rate at which individuals (vervets) learn to tolerate each other in a box experiment is influenced by social factors (audience, social network, behavior of the partner) and idioyncratic factors (age, rank)

Hypothesis

    1. Hypothesis about the Presence of High-Ranking Individuals:

The presence of a higher number of high-ranking individuals in the audience will negatively correlate with the level of tolerance achieved among vervets in the box experiment. This is expected to result in higher frequencies of aggressive behaviors, intrusions, and loss of interest, particularly from lower-ranking individuals.

    1. Hypothesis about Partner Agonistic Behaviors:

Vervets tolerance levels in the box experiment will be influenced by their partner’s display of agonistic behaviors. Specifically, partners who exhibit more frequent agonistic behaviors towards their partner will lead to decrease in their motivation to participate in future trials.

    1. Hypothesis about the Establishment of an Optimal Distance:

During the box experiment, vervet dyads will establish an “optimal” distance for interaction, characterized by a higher frequency of tolerance compared to other distances. This optimal distance is expected to signify that the individuals tolerate each other more effectively at this specific proximity .

    1. Hypothesis about Age and Rank:

The age and rank of individual vervets within the group will influence the success of the trials in the box experiment. Specifically, older and higher-ranking individuals are expected to exhibit lower rates of success compared to dyads consisting of younger and lower-ranked individuals. This decrease in success is anticipated to be associated with a higher frequency of aggressive behaviors displayed by older and higher-ranking individuals towards their partners. (I’m not sure this hypothesis makes sens, I have the feeling age and rank must have an influence but I don’t know how to put it, I will think about it)

    1. Hyptohesis about seasonality

Seasonality is expected to impact the motivation of vervet dyads to participate in the box experiment. We hypothesize that dyads will have lower motivation, as indicated by a reduced number of trials, during the summer months compared to the winter months. This difference in motivation is likely influenced by temperature and food availability. To test this hypothesis, we will categorize the data into four seasonal periods, each spanning four months, and analyze whether there is a significant effect of seasonality on the motivation to engage in the trials.

6.Statistical tests and analisis of the data

Statistical tests

  • Hypothesis 1: Influence of High-Ranking Individuals

Variables Needed:

DyadResponse (specifically, “aggression” responses) Amountaudience (to measure the number of individuals in the audience) Audience…15 (to identify the names of individuals in the audience for calculating dominance ranks) Elo rating of the individuals based on the ab libitum data collected in IVP (which I have to calculate asap)

Statistical Analysis:Logistic Regression, as it could analyze the influence of high-ranking individuals on the occurrence of aggression in dyad responses. This will help determine whether the presence of high-ranking individuals affects the likelihood of aggression.

  • Hypothesis 2: Impact of Partner’s Agonistic Behaviors

Variables Needed:

  • DyadResponse (specifically, “aggression” responses)
  • MaleagressF (male’s aggression towards female)
  • FemaleaggressM (female’s aggression towards male)

Statistical Analysis: Logistic Regression as it could be used to assess how the occurrence of aggression in dyad responses is influenced by the partner’s gender-specific agonistic behaviors.

  • Hypothesis 3: Identification of an Optimal Interaction Distance

Variables Needed:

  • DyadDistance (distance between boxes)
  • Tolerance (as a binary outcome)

Statistical Analysis: generalized Linear Model (GLM) to investigate whether there is an optimal distance that leads to a higher likelihood of tolerance (Tolerance = 1).

  • Hypothesis 4: Role of Age and Rank

Variables Needed:

  • Tolerance (as a binary outcome)
  • Male and Female (to identify individuals’ ages and ranks)
  • Dyad (to link individuals to dyads)
  • Birthdate to calculate the age of each individual

Statistical Analysis: Logistic Regression Logistic regression can be employed to determine whether the age and rank of individual vervets within dyads have an impact on the likelihood of tolerance (Tolerance = 1).

  • Hypothesis 5: Influence of Seasonality

Variables Needed:

  • Date (to categorize data into seasons)
  • Trial (to count the number of trials in each season) and the data for at least 365 days so i can separate the data in 4 (1 year = 4 seasons = 12*4 month) to see if they may be an effect of seasonality on the motivation (amount of trials) of the dyads

Statistical Analysis:

ANOVA or Kruskal-Wallis Test: Depending on the distribution of your trial data, you can use either ANOVA (if the data are normally distributed) or the Kruskal-Wallis test (for non-normally distributed data) to assess the impact of seasonality on the number of trials. If significant differences are found, you can follow up with post-hoc tests to identify which seasons differ from each other. Please note that the effectiveness of these analyses may depend on the distribution of your data and specific research objectives. You may also consider conducting exploratory data analysis (e.g., visualization) to gain a better understanding of your dataset before performing these analyses. Additionally, if you have specific questions about data preprocessing or variable transformations, feel free to ask for further guidance. –> I took this from ChatGPT, I have to look more into it

REMARKS: So here are a few updates I made in the document. I also planned to send my cleaned data to Radu (the statistician of UNINE) as he was keen to help me find the right test. Of course I will also look again in Bshary’s and Charlotte’s work with the boxes and improve these suggestions that are quite simple for now

Also I still have to clean the last grpahs about male/female aggression as I didn’t finish that yet. I juste wanted to share my hypothesis and ideas for statistics so I can soon go into the “serious” work

Anyway, thank you in advance for your help <3

Michael

7. Plotting the results of the analysis

9. Interpretation of the results

10. Comeback on the research question and hypothesis

11. Bibliography

12. Organization for my paper

  • Introduction
    • Tolerance humans, primates
    • Apes vs monkeys / Captivity vs Wild
    • IVP: Wild habituated vervets, experiments possible
    • Paper Bshary, Canteloup… Prolongation study
    • Relevance idea/topic research
    • Research question & hypothesis

But: intro need triangle shape: broad to narrow end wiht research question> tolerance importance > animal reign, actual knowledge/ direction knowledge we need > show how my experiment goes in that way How to adress the gap, answer with research question

Then explain why choosing vervet monkeys, (IVP in methods), sociality, experiments made

  • Methods

    • IVP, research area, (goal, house, type people)

    • Population: groups, dyads, male/female, ranks..

    • Box material: boxes, remotes, batteries, camera, tripod, corn (no marmelade ;), (water spray, security reason, non agressive way to select individuals and not engage with mokeys when reachrging boxes with corn), pattern, previous distances, tablets, box experiment form

    • Tablets

    • (No observers mentionned)

    • Habituation boxes > individuals trained to recognice boxes, they have differernt levels of habituation

    • Patterns > appendix, mention similar to habituation, use to recognize box but efficieny depeds of experience)

    • Selection dyads > assigment from elo rating (different rank), if above average bond no dyad made, if not possible, availibilty of monkey also factor !! Non random can be a problem, think about why and how you selected data We created variations in dyads made by different sex, rank and not above average bonde (calculate bondeness)

    • Amount corn, do you want to mention it> maybe important Calculate corn during and placement cf paper on corn /food motivation

    • Corn (daily intake vervet % made from corn, cf site we saw, cf screenshot, comapre paper previousely made an all)

    • 1st dyad trial (BD) > appendix

    • Videos > details appendix

    • Finding dyads > appendix

    • Placement to attract them > meniton if statiscial made on placement corn

    • Trials (1 session = max 15 trials/in total) (session could be broken in different sub sessions to reach 15 trials max)

    • If agression > 1m / If 2x tolerance < 1m , also if not approaching > 1m ( if no tolerance increase distance except if intrusion) (borgeaud > expectation fo aggression)

    • Time of the day > appendix

    • Territory? > appendix

    • Amount sessions p day/week, how we chose the moment to follow them >appendix

    • Problems/ unplanned events: weather, BGE’s, not finding the monkeys (group, dyad or individual), dispersal of males, river crossing, inacessibility (experiments or boxes), low vision (experiments or monkeys),> appendix

    • (Where do i mention the confounding variables?) > look in litterature, if something that could affect and already reported in papers check, oterhwise exclude “normal life” factors for both monekys and Experimenter

    • Types of experimental plan

    • Statistical tests (for each hypothesis)

  • Analysis

  • Results

  • Interpretation

  • Conclusion

Key cognitive skilly that may be involved in the box experiment

  • Problem solving
  • Memory
  • Conflict resolution
  • Cooperation

Key idea to develop & insghts

  • Inter sexual food competition/or tolerance: which type of adaptation

  • Tolerance

  • Adaptation to a new context (box experiment setting and repeated encounters with a specific individual)

  • Evolution of tolerance in competitive context

  • Social cognition > meaning influence of the group or audience on the choice of an individual of a Dyad

  • Primate social structure and dynamics

  • Foundation of tolerance, solving of competition related to food, mechanisms underlying male-female vervet relationships, dyadic interactions

  • Insights on the evolution of complex decision making in a social context and adaptive mechanisms related to food competition x

In summary, studying dyadic interactions, particularly between males and females in vervet monkeys, not only enhances our understanding of evolutionary origins and adaptive advantages of complex cognition but also provides valuable insights into human social behavior, cooperation, and cognitive foundations shared across species. These studies bridge the gap between animal behavior research and cognitive science, offering interdisciplinary perspectives on the complexities of social interactions and their implications for both animal and human societies.

Glossary

  • Tolerance: Tolerance: An individual has an encounter with a conspecific and can freely leave but remains in the encounter without acting aggressively toward the conspecific. (Pisor & Surbeck, 2019)
  • Agression
  • Session
  • Trial
  • Group: In the Primate order, groups are individuals “which remain [physically] together in or separate from a larger unit” and interact with each other more than with other individuals.6 This definition does not cover all uses of the word “group” in the social sciences (e.g., human identity groups who identify with a common name or symbol may or may not interact with one another more frequently than with other individuals). Because of this ambiguity, we use the word “community” when referring to humans to better capture the notion of spatial proximity, per Ref. 54. Members of the same group are referred to as “same-group” and those from another group “extra-group.” (Pisor & Surbeck, 2019)

Bibliography

• Pisor, A. C., & Surbeck, M. (2019). The evolution of intergroup tolerance in nonhuman primates and humans. Evolutionary Anthropology: Issues and ReViews. Advance online publication. https://doi.org/10.1002/evan.21793 (Pisor & Surbeck, 2019)

Annex

Annex 1 : View of the dataset when imported - First 6 entries of each variable

  • We can see here the brief View of the original dataset names BoxExwhen i initially imported it as seen in section 0: Opening data
First Few Entries (continued below)
Date Time Data Group
2022-09-27 1899-12-31 09:47:50 Box Experiment Baie Dankie
2022-09-27 1899-12-31 09:50:07 Box Experiment Baie Dankie
2022-09-27 1899-12-31 09:53:11 Box Experiment Baie Dankie
2022-09-27 1899-12-31 09:54:28 Box Experiment Baie Dankie
2022-09-27 1899-12-31 09:55:19 Box Experiment Baie Dankie
2022-09-27 1899-12-31 09:56:56 Box Experiment Baie Dankie
Table continues below
GPSS GPSE MaleID FemaleID
-28.010549999999999 31.191050000000001 Nge Oerw
-28.010549999999999 31.191050000000001 Nge Oerw
-28.010549999999999 31.191050000000001 Nge Oerw
-28.010549999999999 31.191050000000001 Nge Oerw
-28.010549999999999 31.191050000000001 Nge Oerw
-28.010549999999999 31.191050000000001 Nge Oerw
Table continues below
Male placement corn MaleCorn FemaleCorn DyadDistance DyadResponse
NA 3 NA 2m Tolerance
NA 3 NA 2m Tolerance
NA 3 NA 1m Tolerance
NA 3 NA 1m Tolerance
NA 3 NA 0m Tolerance
NA 3 NA 0m Tolerance
Table continues below
OtherResponse Audience IDIndividual1 IntruderID
NA Obse; Oup; Sirk NA NA
NA Obse; Oup; Sirk NA NA
NA Oup; Sirk NA NA
NA Sirk NA NA
NA Sey; Sirk NA NA
NA Sey; Sirk NA NA
Table continues below
Remarks
NA
NA
Nge box did not open because of the battery. Oerw vocalized to MA when he ap to the box to open it.
Sey came to the boxes once they were open
NA
NA
Observers DeviceId
Josefien; Michael; Ona; Zonke {7A4E6639-7387-7648-88EC-7FD27A0F258A}
Josefien; Michael; Ona; Zonke {7A4E6639-7387-7648-88EC-7FD27A0F258A}
Josefien; Michael; Ona; Zonke {7A4E6639-7387-7648-88EC-7FD27A0F258A}
Josefien; Michael; Ona; Zonke {7A4E6639-7387-7648-88EC-7FD27A0F258A}
Josefien; Michael; Ona; Zonke {7A4E6639-7387-7648-88EC-7FD27A0F258A}
Josefien; Michael; Ona; Zonke {7A4E6639-7387-7648-88EC-7FD27A0F258A}